GLOBAL °* 
EDITIC 


A First Course in Probability 


TENTH EDITION 


Sheldon Ross 


A First COURSE IN PROBABILITY 


Tenth Edition 
Global Edition 


SHELDON ROSS 


University of Southern California 


Director, Portfolio Management: Deirdre Lynch 

Courseware Portfolio Manager: Suzanna 
Bainbridge 

Courseware Portfolio Management Assistant: 
Morgan Danna 

Assistant Editors, Global Edition: Tanima Ghosh 
and Shaoni Mukherjee 

Content Producer: Tara Corpuz 

Managing Producer: Scott Disanno 

Producer: Jon Wooding 

Product Marketing Manager: Yvonne Vannatta 


Product Marketing Assistant: Jon Bryant 
Field Marketing Manager: Evan St. Cyr 
Senior Author Support/Technology Specialist: 
Joe Vetere 
Manager, Rights and Permissions: Gina Cheselka 
Cover Design: Lumina Datamatics 
Manufacturing Buyer: Carol Melville, LSC 
Communications 
Manufacturing Buyer, Global Edition: 
Kay Holman 
Cover Image: sukiyaki/Shutterstock 


Pearson Education Limited 
KAO Two 

KAO Park 

Harlow 

CM17 9SR 

United Kingdom 


and Associated Companies throughout the world 
Visit us on the World Wide Web at: www.pearsonglobaleditions.com 
© Pearson Education Limited 2020 


The rights of Sheldon Ross to be identified as the author of this work have been asserted by him in 
accordance with the Copyright, Designs and Patents Act 1988. 


Authorized adaptation from the United States edition, entitled A First Course in Probability, 10th Edition, 
ISBN 9780134753119, by Sheldon Ross, published by Pearson Education © 2019. 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or 
transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, 
without either the prior written permission of the publisher or a license permitting restricted copying in 
the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, 
London ECIN 8TS. 


All trademarks used herein are the property of their respective owners. The use of any trademark in this 
text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does 
the use of such trademarks imply any affiliation with or endorsement of this book by such owners. 


Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this 
textbook appear on page 518 within text. 


PEARSON, ALWAYS LEARNING, and MYLAB are exclusive trademarks in the U.S. and/or other 
countries owned by Pearson Education, Inc. or its affiliates. 


Unless otherwise indicated herein, any third-party trademarks that may appear in this work are the prop- 
erty of their respective owners and any references to third-party trademarks, logos or other trade dress 
are for demonstrative or descriptive purposes only. Such references are not intended to imply any spon- 
sorship, endorsement, authorization, or promotion of Pearson’s products by the owners of such marks, 
or any relationship between the owner and Pearson Education, Inc. or its affiliates, authors, licensees or 
distributors. 


This eBook is a standalone product and may or may not include all assets that were part of the print 
version. It also does not provide access to other Pearson digital products like MyLab and Mastering. The 
publisher reserves the right to remove any material in this eBook at any time. 


ISBN 10: 1-292-26920-0 
ISBN 13: 978-1-292-26920-7 
ebook ISBN 13: 978-1-292-26923-8 


British Library Cataloguing-in-Publication Data 
A catalogue record for this book is available from the British Library 


For Rebecca 


This page intentionally left blank 


CONTENTS 


Preface 8 Theoretical Exercises 125 
1 Self-Test Problems and Exercises 128 
CoMBINATORIAL ANALYSIS 13 4 
RANDOM VARIABLES 131 


1.1 Introduction 13 


1.2 The Basic Principle of Counting 14 4.1 Random Variables 131 


1.3. Permutations 15 4.2 Discrete Random Variables 135 
1.4. Combinations 17 4.3 Expected Value 138 
1.5 Multinomial Coefficients 21 4.4 Expectation of a Function of a Random 


Variable 140 
1.6 |The Number of Integer Solutions of Equations 24 ‘ 


4.5 Variance 144 
Summary 27 


4.6 ~The Bernoulli and Binomial Random 


Problems 27 Variables 149 


Theoretical Exercises: 30 4.6.1 Properties of Binomial Random 


Self-Test Problems and Exercises 32 Variables 154 
2 4.6.2 Computing the Binomial Distribution 
AXIOMS OF PROBABILITY 34 Function 157 


2.1 infeoducton: 34 4.7 The Poisson Random Variable 158 


4.7.1 Computing the Poisson Distribution 
Function 170 
4.8 Other Discrete Probability 
Distributions 170 


2.2 Sample Space and Events 34 
2.3. Axioms of Probability 38 


2.4 Some Simple Propositions 41 


2.5 Sample Spaces Having Equally Likely 4.8.1 The Geometric Random Variable 170 

Outcomes 45 : : . 
4.8.2 The Negative Binomial Random 

2.6 Probability as a Continuous Set Function 56 Variable 172 

2.7 Probability as a Measure of Belief 60 4.8.3 The Hypergeometric Random 
Summary 61 Variable 175 
Problems 62 4.8.4 The Zeta (or Zipf) Distribution 179 
Theoretical Exercises 67 4.9 — Expected Value of Sums of Random 
Self-Test Problems and Exercises 68 Variables 179 

4.10 Properties of the Cumulative Distribution 
3 CoNnDITIONAL PROBABILITY AND Function 184 
INDEPENDENCE 70 Summary 186 


Problems 187 
3.1 Introduction 70 


3.2 Conditional Probabilities 70 


Theoretical Exercises 194 


Self-Test Problems and Exercises 198 
3.3 Bayes’s Formula 76 


3.4 Independent Events 90 5 ConTINUOUS RANDOM VARIABLES 201 


3.5 P(-|F) Isa Probability 107 


5.1 Introduction 201 
Summary 114 


5.2 Expectation and Variance of Continuous 


Problems 115 
Random Variables 205 5 


6 Contents 


5.3 
5.4 


5.5 


5.6 


5.7 


The Uniform Random Variable 209 
Normal Random Variables 212 


5.4.1 The Normal Approximation to the 
Binomial Distribution 219 


Exponential Random Variables 223 
5.5.1 Hazard Rate Functions 227 
Other Continuous Distributions 230 
5.6.1 The Gamma Distribution 230 
5.6.2 The Weibull Distribution 231 
5.6.3 The Cauchy Distribution 232 
5.6.4 The Beta Distribution 233 
5.6.5 The Pareto Distribution 235 


The Distribution of a Function 
of a Random Variable 236 


Summary 239 
Problems 240 
Theoretical Exercises 243 


Self-Test Problems and Exercises 245 


6 JoIntLy DIsTRIBUTED RANDOM 
VARIABLES 249 


6.1 
6.2 
6.3 


6.4 
6.5 


6.6 
6.7 


6.8 


Joint Distribution Functions 249 
Independent Random Variables 259 
Sums of Independent Random Variables 270 


6.3.1  Identically Distributed Uniform Random 
Variables 270 


6.3.2 Gamma Random Variables 272 
6.3.3 Normal Random Variables 274 


6.3.4 Poisson and Binomial Random 
Variables 278 


Conditional Distributions: Discrete Case 279 


Conditional Distributions: Continuous 
Case 282 


Order Statistics 288 


Joint Probability Distribution of Functions 
of Random Variables 292 


Exchangeable Random Variables 299 
Summary 302 

Problems 303 

Theoretical Exercises 308 


Self-Test Problems and Exercises 311 


PROPERTIES OF EXPECTATION 315 


7.1 
7.2 


7.3 


7.4 


7.5 


7.6 
C7 


7.8 


7.9 


Introduction 315 
Expectation of Sums of Random Variables 316 


7.2.1 Obtaining Bounds from Expectations via 
the Probabilistic Method 329 


7.2.2. The Maximum-Minimums Identity 331 


Moments of the Number of Events that 
Occur 333 


Covariance, Variance of Sums, and 
Correlations 340 


Conditional Expectation 349 
7.5.1 Definitions 349 


7.5.2 Computing Expectations by 
Conditioning 351 


7.5.3 Computing Probabilities by 
Conditioning 361 


7.5.4 Conditional Variance 366 

Conditional Expectation and Prediction 368 
Moment Generating Functions 372 

7.7.1 Joint Moment Generating Functions 381 


Additional Properties of Normal Random 
Variables 383 


7.8.1 The Multivariate Normal Distribution 383 


7.8.2 The Joint Distribution of the 
Sample Mean and Sample Variance 385 


General Definition of Expectation 387 
Summary 389 

Problems 390 

Theoretical Exercises 397 


Self-Test Problems and Exercises 402 


Limit THEOREMS 406 


8.1 
8.2 


8.3 


8.4 
8.5 


8.6 


Introduction 406 


Chebyshev’s Inequality and the Weak 
Law of Large Numbers 406 


The Central Limit Theorem 409 
The Strong Law of Large Numbers 418 


Other Inequalities and a Poisson Limit 
Result 421 


Bounding the Error Probability When 
Approximating a Sum of Independent 


10 


8.7 


Bernoulli Random Variables by a Poisson 
Random Variable 430 


The Lorenz Curve 432 
Summary 436 

Problems 436 
Theoretical Exercises 438 


Self-Test Problems and Exercises 440 


ADDITIONAL TOPICS IN 
PROBABILITY 442 


9.1 
9.2 
9.3 
9.4 


The Poisson Process 442 

Markov Chains 444 

Surprise, Uncertainty, and Entropy 449 
Coding Theory and Entropy 453 
Summary 459 

Problems and Theoretical Exercises 459 


Self-Test Problems and Exercises 460 


SIMULATION 462 


10.1 
10.2 


Introduction 462 


General Techniques for Simulating 
Continuous Random Variables 465 


10.2.1 The Inverse Transformation Method 465 


10.2.2 The Rejection Method 466 


Contents 7 


10.3 Simulating from Discrete Distributions 471 
10.4 Variance Reduction Techniques 474 
10.4.1 Use of Antithetic Variables 475 


10.4.2 Variance Reduction by 
Conditioning 475 


10.4.3 Control Variates 477 
Summary 477 
Problems 478 


Self-Test Problems and Exercises 479 
Answers to Selected Problems 480 


Solutions to Self-Test Problems 
and Exercises 482 


Index 514 


Prerace 


“We see that the theory of probability is at bottom only common sense reduced 
to calculation; it makes us appreciate with exactitude what reasonable minds feel 
by a sort of instinct, often without being able to account for it....It is remarkable 
that this science, which originated in the consideration of games of chance, should 
have become the most important object of human knowledge. ... The most impor- 
tant questions of life are, for the most part, really only problems of probability.” So 
said the famous French mathematician and astronomer (the “Newton of France”) 
Pierre-Simon, Marquis de Laplace. Although many people believe that the famous 
marquis, who was also one of the great contributors to the development of probabil- 
ity, might have exaggerated somewhat, it is nevertheless true that probability theory 
has become a tool of fundamental importance to nearly all scientists, engineers, med- 
ical practitioners, jurists, and industrialists. In fact, the enlightened individual had 
learned to ask not “Is it so?” but rather “What is the probability that it is so?” 


General Approach and Mathematical Level 


This book is intended as an elementary introduction to the theory of probability 
for students in mathematics, statistics, engineering, and the sciences (including com- 
puter science, biology, the social sciences, and management science) who possess the 
prerequisite knowledge of elementary calculus. It attempts to present not only the 
mathematics of probability theory, but also, through numerous examples, the many 
diverse possible applications of this subject. 


Content and Course Planning 


Chapter 1 presents the basic principles of combinatorial analysis, which are most 
useful in computing probabilities. 

Chapter 2 handles the axioms of probability theory and shows how they can be 
applied to compute various probabilities of interest. 

Chapter 3 deals with the extremely important subjects of conditional probability 
and independence of events. By a series of examples, we illustrate how conditional 
probabilities come into play not only when some partial information is available, 
but also as a tool to enable us to compute probabilities more easily, even when 
no partial information is present. This extremely important technique of obtaining 
probabilities by “conditioning” reappears in Chapter 7, where we use it to obtain 
expectations. 

The concept of random variables is introduced in Chapters 4, 5, and 6. Discrete 
random variables are dealt with in Chapter 4, continuous random variables in 
Chapter 5, and jointly distributed random variables in Chapter 6. The important con- 
cepts of the expected value and the variance of a random variable are introduced in 
Chapters 4 and 5, and these quantities are then determined for many of the common 
types of random variables. 
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Additional properties of the expected value are considered in Chapter 7. Many 
examples illustrating the usefulness of the result that the expected value of a sum 
of random variables is equal to the sum of their expected values are presented. 
Sections on conditional expectation, including its use in prediction, and on moment- 
generating functions are contained in this chapter. In addition, the final section intro- 
duces the multivariate normal distribution and presents a simple proof concerning 
the joint distribution of the sample mean and sample variance of a sample from a 
normal distribution. 

Chapter 8 presents the major theoretical results of probability theory. In par- 
ticular, we prove the strong law of large numbers and the central limit theorem. 
Our proof of the strong law is a relatively simple one that assumes that the random 
variables have a finite fourth moment, and our proof of the central limit theorem 
assumes Levy’s continuity theorem. This chapter also presents such probability 
inequalities as Markov’s inequality, Chebyshev’s inequality, and Chernoff bounds. 
The final section of Chapter 8 gives a bound on the error involved when a probability 
concerning a sum of independent Bernoulli random variables is approximated by the 
corresponding probability of a Poisson random variable having the same expected 
value. 

Chapter 9 presents some additional topics, such as Markov chains, the Poisson 
process, and an introduction to information and coding theory, and Chapter 10 con- 
siders simulation. 

As in the previous edition, three sets of exercises are given at the end of each 
chapter. They are designated as Problems, Theoretical Exercises, and Self-Test Prob- 
lems and Exercises. This last set of exercises, for which complete solutions appear in 
Solutions to Self-Test Problems and Exercises, is designed to help students test their 
comprehension and study for exams. 


Changes for the Tenth Edition 


The tenth edition continues the evolution and fine tuning of the text. Aside from a 
multitude of small changes made to increase the clarity of the text, the new edition 
includes many new and updated problems, exercises, and text material chosen both 
for inherent interest and for their use in building student intuition about probability. 
Illustrative of these goals are Examples 4n of Chapter 3, which deals with comput- 
ing NCAA basketball tournament win probabilities, and Example 5b of Chapter 4, 
which introduces the friendship paradox. There is also new material on the Pareto 
distribution (introduced in Section 5.6.5), on Poisson limit results (in Section 8.5), 
and on the Lorenz curve (in Section 8.7). 
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Chapter 


COMBINATORIAL ANALYSIS 


Contents 

1.1 Introduction 1.5 Multinomial Coefficients 

1.2. The Basic Principle of Counting 1.6 The Number of Integer Solutions of 
1.3. Permutations Equations 

1.4 Combinations 

I.1 Introduction 


Here is a typical problem of interest involving probability: A communication system 
is to consist of n seemingly identical antennas that are to be lined up in a linear order. 
The resulting system will then be able to receive all incoming signals—and will be 
called functional—as long as no two consecutive antennas are defective. If it turns 
out that exactly m of the n antennas are defective, what is the probability that the 
resulting system will be functional? For instance, in the special case where n = 4 and 
m = 2, there are 6 possible system configurations, namely, 


0110 


where 1 means that the antenna is working and 0 that it is defective. Because the 
resulting system will be functional in the first 3 arrangements and not functional in 
the remaining 3, it seems reasonable to take 3 = 5 as the desired probability. In 
the case of general n and m, we could compute the probability that the system is 
functional in a similar fashion. That is, we could count the number of configurations 
that result in the system’s being functional and then divide by the total number of all 
possible configurations. 

From the preceding discussion, we see that it would be useful to have an effec- 
tive method for counting the number of ways that things can occur. In fact, many 
problems in probability theory can be solved simply by counting the number of dif- 
ferent ways that a certain event can occur. The mathematical theory of counting is 
formally known as combinatorial analysis. 
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1.2 The Basic Principle of Counting 


Example 
2a 


The basic principle of counting will be fundamental to all our work. Loosely put, it 
states that if one experiment can result in any of m possible outcomes and if another 
experiment can result in any of n possible outcomes, then there are mn possible 
outcomes of the two experiments. 


The basic principle of counting 


Suppose that two experiments are to be performed. Then if experiment 1 can 
result in any one of m possible outcomes and if, for each outcome of experiment 
1, there are 1 possible outcomes of experiment 2, then together there are mn 
possible outcomes of the two experiments. 


Proof of the Basic Principle: The basic principle may be proven by enumerating all 
the possible outcomes of the two experiments; that is, 


(1,1), d,2), ..., d,m 
(2,1), 2,2), ..., 2,n) 


(m,1), (m,2), ..., (m,n) 


where we say that the outcome is (i, j) if experiment 1 results in its ith possible 
outcome and experiment 2 then results in its jth possible outcome. Hence, the set of 
possible outcomes consists of m rows, each containing n elements. This proves the 
result. 


A small community consists of 10 women, each of whom has 3 children. If one 
woman and one of her children are to be chosen as mother and child of the year, 
how many different choices are possible? 


Solution By regarding the choice of the woman as the outcome of the first experi- 
ment and the subsequent choice of one of her children as the outcome of the second 
experiment, we see from the basic principle that there are 10 X 3 = 30 possible 
choices. O 


When there are more than two experiments to be performed, the basic principle 
can be generalized. 


The generalized basic principle of counting 


If y experiments that are to be performed are such that the first one may result 
in any of n; possible outcomes; and if, for each of these n; possible outcomes, 
there are nz possible outcomes of the second experiment; and if, for each of the 
possible outcomes of the first two experiments, there are n3 possible outcomes 
of the third experiment; and if ..., then there is a total of nj-n2---n,; possible 
outcomes of the r experiments. 


Example 
2b 


Example 
2c 


Example 
2d 


Example 
Ze 


1.3. Permutations 


Example 
3a 
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A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors, and 2 
seniors. A subcommittee of 4, consisting of 1 person from each class, is to be chosen. 
How many different subcommittees are possible? 


Solution We may regard the choice of a subcommittee as the combined outcome of 
the four separate experiments of choosing a single representative from each of the 
classes. It then follows from the generalized version of the basic principle that there 
are 3 X 4 X 5 X 2 = 120 possible subcommittees. a 


How many different 7-place license plates are possible if the first 3 places are to be 
occupied by letters and the final 4 by numbers? 


Solution By the generalized version of the basic principle, the answer is 26 - 26 - 
26-10-10 - 10 - 10 = 175,760,000. O 


How many functions defined on n points are possible if each functional value is 
either 0 or 1? 


Solution Let the points be 1,2,...,”. Since f(i) must be either 0 or 1 for each i = 
1,2,...,n, it follows that there are 2” possible functions. |_| 


In Example 2c, how many license plates would be possible if repetition among letters 
or numbers were prohibited? 


Solution In this case, there would be 26 - 25 - 24 -10-9-8- 7 = 78,624,000 
possible license plates. a 


How many different ordered arrangements of the letters a, b, and c are possible? 
By direct enumeration we see that there are 6, namely, abc, acb, bac, bca, cab, 
and cba. Each arrangement is known as a permutation. Thus, there are 6 possible 
permutations of a set of 3 objects. This result could also have been obtained 
from the basic principle, since the first object in the permutation can be any of 
the 3, the second object in the permutation can then be chosen from any of the 
remaining 2, and the third object in the permutation is then the remaining 1. 
Thus, there are 3 - 2 - 1 = 6 possible permutations. 


Suppose now that we have n objects. Reasoning similar to that we have just used 
for the 3 letters then shows that there are 


n(n — 1)\(n — 2)---3-2-1=n! 


different permutations of the objects. 


Whereas n! (read as “n factorial”) is defined to equal 1 - 2---n when n is a 
positive integer, it is convenient to define 0! to equal 1. 


How many different batting orders are possible for a baseball team consisting of 9 
players? 


Solution There are 9! = 362,880 possible batting orders. a 
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Example 
3b 


Example 
3c 


Example 
3d 


A class in probability theory consists of 6 men and 4 women. An examination is 
given, and the students are ranked according to their performance. Assume that no 
two students obtain the same score. 


(a) How many different rankings are possible? 


(b) Ifthe men are ranked just among themselves and the women just among them- 
selves, how many different rankings are possible? 


Solution (a) Because each ranking corresponds to a particular ordered arrangement 
of the 10 people, the answer to this part is 10! = 3,628,800. 

(b) Since there are 6! possible rankings of the men among themselves and 4! 
possible rankings of the women among themselves, it follows from the basic principle 
that there are (6!)(4!) = (720)(24) = 17,280 possible rankings in this case. | 


Ms. Jones has 10 books that she is going to put on her bookshelf. Of these, 4 are math- 
ematics books, 3 are chemistry books, 2 are history books, and 1 is a language book. 
Ms. Jones wants to arrange her books so that all the books dealing with the same 
subject are together on the shelf. How many different arrangements are possible? 


Solution There are 4! 3! 2! 1! arrangements such that the mathematics books are 
first in line, then the chemistry books, then the history books, and then the language 
book. Similarly, for each possible ordering of the subjects, there are 4! 3! 2! 1! pos- 
sible arrangements. Hence, as there are 4! possible orderings of the subjects, the 
desired answer is 4! 4! 3! 2! 1! = 6912. Oo 

We shall now determine the number of permutations of a set of n objects when 
certain of the objects are indistinguishable from one another. To set this situation 
straight in our minds, consider the following example. 


How many different letter arrangements can be formed from the letters PEPPER? 


Solution We first note that there are 6! permutations of the letters P; Ej P2P3E2R 
when the 3P’s and the 2E’s are distinguished from one another. However, consider 
any one of these permutations—for instance, P; P2F, P3E2R. If we now permute the 
P’s among themselves and the E’s among themselves, then the resultant arrangement 
would still be of the form PPEPER. That is, all 3! 2! permutations 


P\P)E,P3E>R PP)E>P3E\R 
P\P3E,P)E>R PP3E>P2E\R 
P>P\E,P3E>R PP\E>P3E\R 
P>P3E,P\E>R P)P3E>P}E\R 
P3P,E,P)E>R P3P,E>P>E\R 
P3P)E,;P,E>R P3P)E>P{E\R 


are of the form PPEPER. Hence, there are 6!/(3! 2!) = 60 possible letter arrange- 
ments of the letters PEPPER. a 


In general, the same reasoning as that used in Example 3d shows that there are 
n! 
ny! ng! --- n,! 


different permutations of n objects, of which 7 are alike, m2 are alike, ...,”, are 
alike. 
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Example A chess tournament has 10 competitors, of which 4 are Russian, 3 are from the 
3e United States, 2 are from Great Britain, and 1 is from Brazil. If the tournament 
result lists just the nationalities of the players in the order in which they placed, how 

many outcomes are possible? 


Solution There are ‘Gi 
————__. = 12,600 
4! 3! 2! 1! 


possible outcomes. a 


Example How many different signals, each consisting of 9 flags hung in a line, can be made 
3f from a set of 4 white flags, 3 red flags, and 2 blue flags if all flags of the same color 
are identical? 


Solution There are . 


Apsray = 1 


different signals. a 


1.4 Combinations 


We are often interested in determining the number of different groups of r objects 
that could be formed from a total of n objects. For instance, how many different 
groups of 3 could be selected from the 5 items A, B, C, D, and E? To answer this 
question, reason as follows: Since there are 5 ways to select the initial item, 4 ways to 
then select the next item, and 3 ways to select the final item, there are thus 5 . 4 - 3 
ways of selecting the group of 3 when the order in which the items are selected is 
relevant. However, since every group of 3—say, the group consisting of items A, B, 
and C—will be counted 6 times (that is, all of the permutations ABC, ACB, BAC, 
BCA, CAB, and CBA will be counted when the order of selection is relevant), it 
follows that the total number of groups that can be formed is 


5-4-3 

——~_— = 10 

3-2-1 
In general, asn(n — 1)---(n — r + 1) represents the number of different ways that 
a group of r items could be selected from n items when the order of selection is 
relevant, and as each group of r items will be counted r! times in this count, it follows 
that the number of different groups of r items that could be formed from a set of n 
items is 

AG = Tyee =F A) n! 
r! (an — rir! 


Notation and terminology 


We define @i =n, by 


n _ n!\ 
r} (a—nir 
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Example 
4a 


Example 
4b 


Example 
4c 


and say that : (read as “n choose r”) represents the number of possible 


combinations of n objects taken r at a time. 


Thus, is represents the number of different groups of size r that could be 
selected from a set of n objects when the order of selection is not considered relevant. 


F n\. : 
Equivalently, , | 3s the number of subsets of size r that can be chosen from 


n n!\ 

0)” Oln! 
consistent with the preceding interpretation because in a set of size n there is exactly 
1 subset of size n (namely, the entire set), and exactly one subset of size 0 (namely 


a set of size n. Using that 0! = 1, note that = = 1, which is 


the empty set). A useful convention is to define " equal to 0 when either r > n 


orr < 0. 


A committee of 3 is to be formed from a group of 20 people. How many different 
committees are possible? 
2-5 


= 1140 possible committees. | 


Solution There are 
3 34224 


From a group of 5 women and 7 men, how many different committees consisting of 
2 women and 3 men can be formed? What if 2 of the men are feuding and refuse to 
serve on the committee together? 


Solution As there are ( : 


> possible groups of 2 women, and ( : possible groups 


5.4 
of 3 men, it follows from the basic principle that there are (3) ( : is 
7-6-5 


3-2-1 
Now suppose that 2 of the men refuse to serve together. Because a total of 


= 350 possible committees consisting of 2 women and 3 men. 


2 1 ) 
the feuding men, it follows that there are 35 — 5 = 30 groups that do not contain 


e (; = 5 out of the Pe 35 possible groups of 3 men contain both of 


; = 10 ways to choose the 2 
women, there are 30 - 10 = 300 possible committees in this case. 


both of the feuding men. Because there are still 


Consider a set of m antennas of which m are defective and n — m are functional 
and assume that all of the defectives and all of the functionals are considered indis- 
tinguishable. How many linear orderings are there in which no two defectives are 
consecutive? 


Solution Imagine that the n — m functional antennas are lined up among them- 
selves. Now, if no two defectives are to be consecutive, then the spaces between the 
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Alalal...alala 


1 = functional 


a = place for at most one defective 


Figure I.1 No consecutive defectives. 


functional antennas must each contain at most one defective antenna. That is, in the 
n — m + 1 possible positions—represented in Figure 1.1 by carets—between the 
n — m functional antennas, we must select m of these in which to put the defective 


n—-m-4+1 


antennas. Hence, there are ms 


possible orderings in which there is at 
least one functional antenna between any two defective ones. 


A useful combinatorial identity, known as Pascal's identity, is 


rears) l<=r<n (4.1) 


Equation (4.1) may be proved analytically or by the following combinatorial argu- 
ment: Consider a group of n objects, and fix attention on some particular one of 
these objects—call it object 1. Now, there are i‘ 7 : groups of size r that con- 
tain object 1 (since each such group is formed by selecting r — 1 from the remaining 


n — 1 objects). Also, there are 7 F : groups of size r that do not contain object 
1. As there is a total of ( ; groups of size r, Equation (4.1) follows. 


The values , j are often referred to as binomial coefficients because of their 


prominence in the binomial theorem. 


The binomial theorem 


@+y"=>)> (‘) ey (4.2) 


We shall present two proofs of the binomial theorem. The first is a proof by 
mathematical induction, and the second is a proof based on combinatorial consider- 
ations. 


Proof of the Binomial Theorem by Induction: Whenn = 1, Equation (4.2) reduces to 


1 1 
va y=())ay + ({)s=y4 x 
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Assume Equation (4.2) form — 1. Now, 


«+ty"=@+yar yr! 


n-1 
ee ("5 ane 
sf n— 1) ct ntk = i= k. nk 
— 3 i ery + > i xy 
k=0 k=0 


Letting i= k + 1 in the first sum andi = k in the second sum, we find that 


ROO EG w ni 


f=] 


where the next-to-last equality follows by Equation (4.1). By induction, the theorem 
is now proved. 


Combinatorial Proof of the Binomial Theorem: Consider the product 


(x1 + yi)(x2 + y2)-++ (Xn + Yn) 


Its expansion consists of the sum of 2” terms, each term being the product of n fac- 
tors. Furthermore, each of the 2” terms in the sum will contain as a factor either x; 
or y; for each i = 1,2,...,n. For example, 


(x1 + yi)(2 + y2) = xyxX2 + X12 + yix2 + Yiy2 


Now, how many of the 2” terms in the sum will have k of the x;’s and (n — k) of 
the y,’s as factors? As each term consisting of k of the x;’s and (n — k) of the y;’s 
corresponds to a choice of a group of k from the n values x1,x2,...,Xn, there are 


n . ; 
- such terms. Thus, letting x; = x, yj = y,i=1,...,n, we see that 


(x + yy" = > (x) 
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Example Expand (x + y)°. 


4d 
Solution 
3_(3),03 oa ee: 3)\ 21 3 \ 3,0 
(x + y) =(3)" + (7) + (3) + L xy 
=y + 3xy? + 3xy 38 B 
Example How many subsets are there of a set consisting of n elements? 
4e 


‘ . n ; F . 
Solution Since there are ( ie subsets of size k, the desired answer is 


»(7)=c +1" =2" 


k=0 


This result could also have been obtained by assigning either the number 0 or the 
number 1 to each element in the set. To each assignment of numbers, there cor- 
responds, in a one-to-one fashion, a subset, namely, that subset consisting of all 
elements that were assigned the value 1. As there are 2” possible assignments, the 
result follows. 

Note that we have included the set consisting of 0 elements (that is, the null set) 
as a subset of the original set. Hence, the number of subsets that contain at least 1 
element is 2” — 1. | 


1.5 Multinomial Coefficients 


In this section, we consider the following problem: A set of 1 distinct items is to be 
divided into r distinct groups of respective sizes n1,n2,...,”-, where yy ni = Nn. 
How many different divisions are possible? To answer this question, we note that 


there are possible choices for the first group; for each choice of the first group, 
1 


there are ( : 7 -" possible choices for the second group; for each choice of the 
2 


—-nm -—n 


first two groups, there are : 
3 


2 possible choices for the third group; and 


so on. It then follows from the generalized version of the basic counting principle 
that there are 


n n—-n n ny n2 abies Ny 1 
ny ng Ny 


_ n! (n — ny)! = (n — ny no sae Ny—1)! 
(n — ny)! ny! (n — ny — ny)! ny! O! n;! 
n! 


~ ny! ng!-+-n,! 


possible divisions. 
Another way to see this result is to consider the n values 1,1,...,1,2,...,2,..., 
r,...,7, where i appears n; times, for i = 1,...,7. Every permutation of these values 
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Example 
5a 


Example 
5b 


Example 
5c 


corresponds to a division of the n items into the 7 groups in the following manner: 
Let the permutation ij, i2,...,i, correspond to assigning item 1 to group i;, item 2 to 
group iz, and so on. For instance, if nm = 8 and if ny = 4, n2 = 3, and n3 = 1, then 
the permutation 1,1,2,3,2,1,2,1 corresponds to assigning items 1,2,6,8 to the first 
group, items 3,5,7 to the second group, and item 4 to the third group. Because every 
permutation yields a division of the items and every possible division results from 
some permutation, it follows that the number of divisions of 1 items into r distinct 


groups of sizes n1,n2,...,n, is the same as the number of permutations of 1 items 
of which ny are alike, and np are alike, ..., and n, are alike, which was shown in 
n! 


Section 1.3 to equal —_———_—.. 
n!n2!---n,! 


Notation 


| " 1) 
ee 


nNy1,N2,... 
n n!\ 
ny,N2,...,Nr } ny! ng!--- ny! 
n ‘ sovts a 
Thus, represents the number of possible divisions of 1 distinct 
N1,N2,...,Mr 
objects into r distinct groups of respective sizes n1,12,..., Nr. 


A police department in a small city consists of 10 officers. If the department policy is 
to have 5 of the officers patrolling the streets, 2 of the officers working full time at the 
station, and 3 of the officers on reserve at the station, how many different divisions 
of the 10 officers into the 3 groups are possible? 


Solution There are = 2520 possible divisions. Oo 


10! 
5! 2! 3! 
Ten children are to be divided into an A team and a B team of 5 each. The A team 
will play in one league and the B team in another. How many different divisions are 
possible? 


10! 
Solution There are a7 252 possible divisions. O 


In order to play a game of basketball, 10 children at a playground divide themselves 
into two teams of 5 each. How many different divisions are possible? 


Solution Note that this example is different from Example 5b because now the order 

of the two teams is irrelevant. That is, there is no A or B team, but just a division 
consisting of 2 groups of 5 each. Hence, the desired answer is 

101/GIS)) _ 

2200 


The proof of the following theorem, which generalizes the binomial theorem, is 
left as an exercise. 


126 & 


Example 
5d 
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The multinomial theorem 


(tm te tas 


3 n ny no ny 
xy X5 oe ‘x, 
N1,N2,...,Mr 


(N1,...,Nr)* 
nm +e: +n =n 


That is, the sum is over all nonnegative integer-valued vectors (n1,n2,...,Nr) 
such thatn; + m2 + --- + n,=nN. 


n 


The numbers ( 
ny,n2, eee Myr 


are known as multinomial coefficients. 

In the first round of a knockout tournament involving n = 2” players, the n players 
are divided into n/2 pairs, with each of these pairs then playing a game. The losers 
of the games are eliminated while the winners go on to the next round, where the 
process is repeated until only a single player remains. Suppose we have a knockout 
tournament of 8 players. 


(a) How many possible outcomes are there for the initial round? (For instance, 
one outcome is that 1 beats 2, 3 beats 4, 5 beats 6, and 7 beats 8.) 

(b) How many outcomes of the tournament are possible, where an outcome gives 
complete information for all rounds? 


Solution One way to determine the number of possible outcomes for the initial 
round is to first determine the number of possible pairings for that round. To do so, 
note that the number of ways to divide the 8 players into a first pair, a second pair, a 


8 8! 

third pair, and a fourth pair is = —. Thus, the number of possible pair- 
Ds 25 25D Phe 

ings when there is no ordering of the 4 pairs is Fea For each such pairing, there are 


2 possible choices from each pair as to the winner of that game, showing that there 


8!24 8! 
are sa = a possible results of round 1. [Another way to see this is to note that 


8 
there are (;) possible choices of the 4 winners and, for each such choice, there are 


8 8! 
4! ways to pair the 4 winners with the 4 losers, showing that there are 4! (‘) aT 


possible results for the first round.] 


Similarly, for each result of round 1, there are mn possible outcomes of round 2, 


2) ; 
and for each of the outcomes of the first two rounds, there are TT possible outcomes 


of round 3. Consequently, by the generalized basic principle of counting, there are 
8! 4! 2! : 
ao 8! possible outcomes of the tournament. Indeed, the same argument 
can be used to show that a knockout tournament of n = 2” players has n! possible 
outcomes. 

Knowing the preceding result, it is not difficult to come up with a more direct 


argument by showing that there is a one-to-one correspondence between the set of 
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possible tournament results and the set of permutations of 1,...,”. To obtain such 
a correspondence, rank the players as follows for any tournament result: Give the 
tournament winner rank 1, and give the final-round loser rank 2. For the two play- 
ers who lost in the next-to-last round, give rank 3 to the one who lost to the player 
ranked 1 and give rank 4 to the one who lost to the player ranked 2. For the four play- 
ers who lost in the second-to-last round, give rank 5 to the one who lost to player 
ranked 1, rank 6 to the one who lost to the player ranked 2, rank 7 to the one who 
lost to the player ranked 3, and rank 8 to the one who lost to the player ranked 4. 
Continuing on in this manner gives a rank to each player. (A more succinct descrip- 
tion is to give the winner of the tournament rank 1 and let the rank of a player who 
lost in a round having 2* matches be 2 plus the rank of the player who beat him, for 
k =0,...,m — 1.) In this manner, the result of the tournament can be represented 
by a permutation i1,i2,...,i,, where i; is the player who was given rank j. Because 
different tournament results give rise to different permutations, and because there is 
a tournament result for each permutation, it follows that there are the same number 
of possible tournament results as there are permutations of 1,...,7. | 


Example 
Se 


2 2 
(t+ x2 + x3)? = (. 0,0 xyapx3 + ee xY5x3 
+ 


2 0.0.2 2 1.10 
(aia ) aes a 1,1,0 x4 xX9%3 

2 1.0.1 2 0.1.1 

a (1.1 ) ee a 0,1,1 X41 X9X3 


=x + ae + xe + 2x1x2 + 2x4x3 + 2x2x3 | 


“1.6 The Number of Integer Solutions of Equations 


An individual has gone fishing at Lake Ticonderoga, which contains four types of 
fish: lake trout, catfish, bass, and bluefish. If we take the result of the fishing trip to 
be the numbers of each type of fish caught, let us determine the number of possible 
outcomes when a total of 10 fish are caught. To do so, note that we can denote the 
outcome of the fishing trip by the vector (x1, x2,x3,x4) where x, is the number of 
trout that are caught, x2 is the number of catfish, x3 is the number of bass, and x4 is 
the number of bluefish. Thus, the number of possible outcomes when a total of 10 fish 
are caught is the number of nonnegative integer vectors (x1, x2,*3,x4) that sum to 10. 

More generally, if we supposed there were r types of fish and that a total of n 
were caught, then the number of possible outcomes would be the number of non- 
negative integer-valued vectors x1,...,x, such that 


Xpta2 t+... +X, =n (6.1) 


To compute this number, let us start by considering the number of positive integer- 
valued vectors x;,...,x, that satisfy the preceding. To determine this number, sup- 
pose that we have n consecutive zeroes lined up in a row: 


000...00 


* Asterisks denote material that is optional. 
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OvA0vA0an...00an0 


n objects 0 


Choose r — 1 of the spaces a. 


Figure 1.2 Number of positive solutions. 


Note that any selection of r — 1 of the n — 1 spaces between adjacent zeroes (see 
Figure 1.2) corresponds to a positive solution of (6.1) by letting x; be the number of 
zeroes before the first chosen space, x2 be the number of zeroes between the first 
and second chosen space, ..., and x, being the number of zeroes following the last 
chosen space. 

For instance, if we have n = 8 and r = 3, then (with the choices represented by dots) 
the choice 


0.0000.000 


corresponds to the solution x; = 1,x2 = 4,x3 = 3. As positive solutions of (6.1) 
correspond, in a one-to-one fashion, to choices of r — 1 of the adjacent spaces, it 
follows that the number of differerent positive solutions is equal to the number of 
different selections of r — 1 of the n — 1 adjacent spaces. Consequently, we have 
the following proposition. 


acs n—-1 bogs ee 
Proposition) There are sone | distinct positive integer-valued vectors (x1, x2,...,X,) sat- 


6.1 
isfying the equation 


yy txyte teen, x > 0, i=1,...,7 


To obtain the number of nonnegative (as opposed to positive) solutions, note 


that the number of nonnegative solutions of xj + x2 + --- + x; = nis the same 
as the number of positive solutions of yy + --- + yy =n + r (seen by letting 
yi = x1 + 1, i = 1,...,r). Hence, from Proposition 6.1, we obtain the following 
proposition. 
a8 n+r—-1 bgt ae 
Proposition) There are a4 distinct nonnegative integer-valued vectors (x1, x2,...,Xr) 
6.2 


satisfying the equation 


XH tx2ate-- + x= 


Thus, using Proposition 6.2, we see that there are A = 286 possible outcomes 


when a total of 10 Lake Ticonderoga fish are caught. 
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Example 
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Example 
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How many distinct nonnegative integer-valued solutions of x, + x2 = 3 are possible? 


32 = 1 


Solution There are ( >_ 1 


= 4such solutions: (0, 3), (1, 2), (2, 1), (3, 0). 


An investor has $20,000 to invest among 4 possible investments. Each investment 
must be in units of $1000. If the total $20,000 is to be invested, how many different 
investment strategies are possible? What if not all the money needs to be invested? 


Solution If we let x;, i = 1, 2, 3, 4, denote the number of thousands invested in 
investment i, then, when all is to be invested, x1, x2,x3,x4 are integers satisfying the 
equation 


xy t+ x2 + 434+ %4=20 x; = 0 


Hence, by Proposition 6.2, there are = 1771 possible investment strategies. If 


23 

3 
not all of the money needs to be invested, then if we let x5 denote the amount kept in 
reserve, a strategy is a nonnegative integer-valued vector (x1, %2,*3,%4,%5) Satisfying 
the equation 


Xy +X. + x3 4+ x4 + x5 = 20 


Hence, by Proposition 6.2, there are now ( a = 10,626 possible strategies. 0 


How many terms are there in the multinomial expansion of (xj + x2 + --- + x;)”"? 


Solution 
+ + + nm a M1 yr 
et eae 8 rate) a ee “ 


where the sum is over all nonnegative integer-valued (71,...,,) such that ny +--+ + 


n+r— 


such terms. oO 
r-—1 


n, =n. Hence, by Proposition 6.2, there are 


Let us consider again Example 4c, in which we have a set of n items, of which m are 
(indistinguishable and) defective and the remaining n — m are (also indistinguish- 
able and) functional. Our objective is to determine the number of linear orderings 
in which no two defectives are next to each other. To determine this number, let us 
imagine that the defective items are lined up among themselves and the functional 
ones are now to be put in position. Let us denote x; as the number of functional 
items to the left of the first defective, x2 as the number of functional items between 
the first two defectives, and so on. That is, schematically, we have 


X10 x2 0+ ++ Xm O X41 


Now, there will be at least one functional item between any pair of defectives as long 
asx; > 0, i =2,...,m. Hence, the number of outcomes satisfying the condition is 
the number of vectors x1,...,X,,41 that satisfy the equation 


Xp tee t+ Xm. =n -— mM, x, = 0, Xm41 = 0, x) > 0, i= 2,...,m 
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But, on letting yy = x, + ly; = xj, i = 2,...,M, Vm41 = Xm41 + 1, we see that 


this number is equal to the number of positive vectors (y1,... 


equation 


,Ym+1) that satisfy the 


Vio ya ee ae Se ee 


Hence, by Proposition 6.1, there are ( 


n—-m+1 : 
ae such outcomes, in agreement 


with the results of Example 4c. 

Suppose now that we are interested in the number of outcomes in which each 
pair of defective items is separated by at least 2 functional items. By the same rea- 
soning as that applied previously, this would equal the number of vectors satisfying 


the equation 


Mtoe Xm HN — 


m, x, = 0, Xm41 = 0, x = 2,1=2,...,m 


Upon letting yy = x, + lyj=xi — 1,2=2,...,m, ¥m41 = Xm41 + 1, we see that 
this is the same as the number of positive solutions of the equation 


yi tee + ¥mp1 =n — Im + 3 


Hence, from Proposition 6.1, there are ( 


Summary 


The basic principle of counting states that if an experiment 
consisting of two phases is such that there are n possible 
outcomes of phase 1 and, for each of these n outcomes, 
there are m possible outcomes of phase 2, then there are 
nm possible outcomes of the experiment. 

There are n! = n(n — 1)---3 - 2 - 1 possible linear 
orderings of n items. The quantity 0! is defined to equal 1. 


Let 
( n _ n!\ 
i} Mm—dDb!i! 


when 0 = 7 = n, and let it equal 0 otherwise. This quan- 
tity represents the number of different subgroups of size i 
that can be chosen from a set of size n. It is often called a 


Problems 


I. (a) How many different 7-place license plates are possi- 
ble if the first 2 places are for letters and the other 5 for 
numbers? 

(b) Repeat part (a) under the assumption that no letter or 
number can be repeated in a single license plate. 


2. How many outcome sequences are possible when a die 
is rolled four times, where we say, for instance, that the 
outcome is 3, 4, 3, 1 if the first roll landed on 3, the second 
on 4, the third on 3, and the fourth on 1? 


n—2m+2 


= such outcomes. a 


binomial coefficient because of its prominence in the bino- 
mial theorem, which states that 


n 


(x +4 yy” =a 


=0 


For nonnegative integers n1,.. 


n 
Ny,N2,... 


is the number of divisions of n items into r distinct 
nonoverlapping subgroups of sizes n,,n2...,nr. These 
quantities are called multinomial coefficients. 


.,My Summing to n, 


ni 
Mr ~ nylng! +++ ny! 


3. Ten employees of a company are to be assigned to 10 
different managerial posts, one to each post. In how many 
ways can these posts be filled? 


4. John, Jim, Jay, and Jack have formed a band con- 
sisting of 4 instruments. If each of the boys can play 
all 4 instruments, how many different arrangements are 
possible? What if John and Jim can play all 4 instru- 
ments, but Jay and Jack can each play only piano and 
drums? 
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5. A safe can be opened by inserting a code consisting of 
three digits between 0 and 9. How many codes are possi- 
ble? How many codes are possible with no digit repeated? 
How many codes starting with a 1 are possible? 


6. A well-known nursery rhyme starts as follows: 
“As I was going to St. Ives 
I met a man with 7 wives. 
Each wife had 7 sacks. 
Each sack had 7 cats. 
Each cat had 7 kittens...” 
How many kittens did the traveler meet? 


7. (a) In how many ways can 3 boys and 3 girls sit in a row? 


(b) In how many ways can 3 boys and 3 girls sit in a row if 
the boys and the girls are each to sit together? 


(c) In how many ways if only the boys must sit together? 


(d) In how many ways if no two people of the same sex are 
allowed to sit together? 


8. When all letters are used, how many different letter 
arrangements can be made from the letters 

(a) Partying? 

(b) Dancing? 

(c) Acting? 

(d) Singing? 

9. A box contains 13 balls, of which 4 are yellow, 4 are 


green, 3 are red, and 2 are blue. Find the number of ways 
in which these balls can be arranged in a line. 


10. In how many ways can 8 people be seated in a row if 


(a) there are no restrictions on the seating arrangement? 
(b) persons A and B must sit next to each other? 

(c) there are 4 men and 4 women and no 2 men or 2 women 
can sit next to each other? 

(d) there are 5 men and they must sit next to one another? 
(e) there are 4 married couples and each couple must sit 
together? 


11. In how many ways can 3 novels, 2 mathematics books, 
and 1 chemistry book be arranged on a bookshelf if 


(a) the books can be arranged in any order? 

(b) the mathematics books must be together and the nov- 
els must be together? 

(c) the novels must be together, but the other books can 
be arranged in any order? 


12. How many 3 digit numbers xyz, with x, y, z all ranging 
from 0 to 9 have at least 2 of their digits equal. How many 
have exactly 2 equal digits. 


13. How many different letter configurations of length 
4 or 5 can be formed using the letters of the word 
ACHIEVE? 


14. Five separate awards (best scholarship, best leadership 
qualities, and so on) are to be presented to selected stu- 
dents from a class of 30. How many different outcomes 
are possible if 


(a) a student can receive any number of awards? 
(b) each student can receive at most 1 award? 


15. Consider a group of 20 people. If everyone shakes 
hands with everyone else, how many handshakes take 
place? 


16. How many distinct triangles can be drawn by joining 
any 8 dots on a piece of paper? Note that the dots are in 
such a way that no 3 of them form a straight line. 


17. A dance class consists of 22 students, of which 10 are 
women and 12 are men. If 5 men and 5 women are to be 
chosen and then paired off, how many results are possible? 


18. A team consisting of 5 players is to be chosen from 
a class of 12 boys and 9 girls. How many choices are 
possible if 


(a) all players are of the same gender? 
(b) the team includes both genders? 


19. Seven different gifts are to be distributed among 10 
children. How many distinct results are possible if no child 
is to receive more than one gift? 


20. A team of 9, consisting of 2 mathematicians, 3 statisti- 
cians, and 4 physicists, is to be selected from a faculty of 10 
mathematicians, 8 statisticians, and 7 physicists. How many 
teams are possible? 


21. From a group of 8 women and 6 men, a committee con- 
sisting of 3 men and 3 women is to be formed. How many 
different committees are possible if 


(a) 2 of the men refuse to serve together? 
(b) 2 of the women refuse to serve together? 
(c) 1 man and 1 woman refuse to serve together? 


22. A person has 8 friends, of whom 5 will be invited to a 
party. 

(a) How many choices are there if 2 of the friends are feud- 
ing and will not attend together? 

(b) How many choices if 2 of the friends will only attend 
together? 


23. Consider the grid of points shown at the top of the 
next column. Suppose that, starting at the point labeled 
A, you can go one step up or one step to the right at each 
move. This procedure is continued until the point labeled 
B is reached. How many different paths from A to B are 
possible? 


Hint: Note that to reach B from A, you must take 4 steps 
to the right and 3 steps upward. 


B 
e ® ® ® ® 
e ® ® © ® 
e © ® © ® 
e e © e © 
A 


24. In Problem 23, how many different paths are there 
from A to B that go through the point circled in the fol- 
lowing lattice? 


@ ‘@- aes . J 
a7 


A 


25. A psychology laboratory conducting dream research 
contains 3 rooms, with 2 beds in each room. If 3 sets of 
identical twins are to be assigned to these 6 beds so that 
each set of twins sleeps in different beds in the same room, 
how many assignments are possible? 


26. (a) Show )-4_ (;)2* = se 
(b) Simplify eo ()«* 
27. Expand (4x — 3y)*. 


28. The game of bridge is played by 4 players, each of 
whom is dealt 13 cards. How many bridge deals are pos- 
sible? 


29. Expand (x; + 2x2 + 3x3)*. 


30. If 12 people are to be divided into 3 committees 
of respective sizes 3, 4, and 5, how many divisions are 
possible? 
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31. If 10 gifts are to be distributed among 3 friends, 
how many distributions are possible? What if each friend 
should receive at least 3 gifts? 


32. Ten weight lifters are competing in a team weight- 
lifting contest. Of the lifters, 3 are from the United States, 
4 are from Russia, 2 are from China, and 1 is from Canada. 
If the scoring takes account of the countries that the lifters 
represent, but not their individual identities, how many 
different outcomes are possible from the point of view 
of scores? How many different outcomes correspond to 
results in which the United States has 1 competitor in the 
top three and 2 in the bottom three? 


33. Delegates from 10 countries, including Russia, France, 
England, and the United States, are to be seated in a row. 
How many different seating arrangements are possible if 
the French and English delegates are to be seated next to 
each other and the Russian and U.S. delegates are not to 
be next to each other? 


“34. If 8 identical blackboards are to be divided among 4 
schools, how many divisions are possible? How many if 
each school must receive at least 1 blackboard? 


*35. An elevator starts at the basement with 8 people (not 
including the elevator operator) and discharges them all by 
the time it reaches the top floor, number 6. In how many 
ways could the operator have perceived the people leaving 
the elevator if all people look alike to him? What if the 8 
people consisted of 5 men and 3 women and the operator 
could tell a man from a woman? 


“36. We have $20,000 that must be invested among 4 pos- 
sible opportunities. Each investment must be integral in 
units of $1000, and there are minimal investments that 
need to be made if one is to invest in these opportuni- 
ties. The minimal investments are $2000, $2000, $3000, 
and $4000. How many different investment strategies are 
available if 


(a) an investment must be made in each opportunity? 


(b) investments must be made in at least 3 of the 4 oppor- 
tunities? 


“37. Suppose that 10 fish are caught at a lake that contains 
5 distinct types of fish. 


(a) How many different outcomes are possible, where an 
outcome specifies the numbers of caught fish of each of 
the 5 types? 

(b) How many outcomes are possible when 3 of the 10 fish 
caught are trout? 

(c) How many when at least 2 of the 10 are trout? 
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Theoretical Exercises 


I. Prove the generalized version of the basic counting prin- 
ciple. 


2. Two experiments are to be performed. The first can 
result in any one of m possible outcomes. If the first exper- 
iment results in outcome i, then the second experiment 
can result in any of nj; possible outcomes, i = 1,2,...,m. 
What is the number of possible outcomes of the two exper- 
iments? 


3. In how many ways can r objects be selected from a set of 
n objects if the order of selection is considered relevant? 


4. There are ( ‘ ) different linear arrangements of 1 balls 


of which r are black andn — rare white. Give a combina- 
torial explanation of this fact. 


5. Determine the number of vectors (x1,.. 
each x; is either 0 or 1 and 


n 
IE =k 
i=1 


6. How many vectors x1,...,x, are there for which each x; 
is a positive integer such that 1 = xj = nandx, < x2 < 
2 < x? 


.,Xy,), Such that 


7. Give an analytic proof of Equation (4.1). 


8. Prove that 


Cr )= CoM) )O%) 


Hint: Consider a group of n men and m women. How many 
groups of size r are possible? 


9. Use Theoretical Exercise 8 to prove that 
2n fa \ 
Cr )-Z) 
k=0 


10. From a group of n people, suppose that we want to 
choose a committee of k,k = n, one of whom is to be des- 
ignated as chairperson. 


(a) By focusing first on the choice of the committee and 
then on the choice of the chair, argue that there are ( ‘ ) k 


possible choices. 


(b) By focusing first on the choice of the nonchair 
committee members and then on the choice of the chair, 


argue that there are ( i ss 1 ) (n — k + 1) possible 


choices. 
(c) By focusing first on the choice of the chair and then 
on the choice of the other committee members, argue that 


there aren ( _ : ) possible choices. 
(d) Conclude from parts (a), (b), and (c) that 


{s)-w-rsots)-e0-H 


(e) Use the factorial definition of ( ; to verify the iden- 


tity in part (d). 


11. The following identity is known as Fermat’s combina- 
torial identity: 


(f)-E (21) os 


Give a combinatorial argument (no computations are 
needed) to establish this identity. 

Hint: Consider the set of numbers 1 through n. How many 
subsets of size k have ias their highest numbered member? 


12. Consider the following combinatorial identity: 


n n 
‘ _ —1 
Dep )=an 98 


(a) Present a combinatorial argument for this identity by 
considering a set of n people and determining, in two ways, 
the number of possible selections of a committee of any 
size and a chairperson for the committee. 
Hint: 
(i) How many possible selections are there of a commit- 
tee of size k and its chairperson? 
(ii) How many possible selections are there of a chair- 
person and the other committee members? 


(b) Verify the following identity for n = 1,2,3,4,5: 


n 


(i)" =2"-2n(n + 1) 


k=1 


For a combinatorial proof of the preceding, consider a set 
of n people and argue that both sides of the identity rep- 
resent the number of different selections of a committee, 
its chairperson, and its secretary (possibly the same as the 
chairperson). 


Hint: 
(i) How many different selections result in the commit- 
tee containing exactly k people? 

(ii) How many different selections are there in which 
the chairperson and the secretary are the same? 
(ANSWER: n2”"!,) 

(iii) How many different selections result in the chairper- 
son and the secretary being different? 


(c) Now argue that 
n 
> n 
(i 


Je = 2" n(n + 3) 
k=1 


13. Show that, forn > 0, 


n : 
ee ( ; ) =f 
i=0 
Hint: Use the binomial theorem. 


14. From a set of n people, a committee of size j is to be 
chosen, and from this committee, a subcommittee of size 
i,t = j,is also to be chosen. 


(a) Derive a combinatorial identity by computing, in two 
ways, the number of possible choices of the committee and 
subcommittee —first by supposing that the committee is 
chosen first and then the subcommittee is chosen, and sec- 
ond by supposing that the subcommittee is chosen first and 
then the remaining members of the committee are chosen. 


(b) Use part (a) to prove the following combinatorial iden- 


tity: 
S()()-(t)e es 


j=i 
(c) Use part (a) and Theoretical Exercise 13 to show that 


E(5)(Jrorraw ser 


j=l 
15. Let H;(n) be the number of vectors x1,...,x, for 
which each x; is a positive integer satisfying 1 = x; 
andx, Sx. S-:: S Xxx. 


= 7 


(a) Without any computations, argue that 
Ay(n)=n 
n 
Ay(n) = > Hy-a(j) k > 1 
j=l 


Hint: How many vectors are there in which x, = j? 


(b) Use the preceding recursion to compute H3(5). 
Hint: First compute H2(n) for n = 1, 2, 3, 4, 5. 
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16. Consider a tournament of contestants in which the 
outcome is an ordering of these contestants, with ties 
allowed. That is, the outcome partitions the players into 
groups, with the first group consisting of the players who 
tied for first place, the next group being those who tied 
for the next-best position, and so on. Let N(n) denote 
the number of different possible outcomes. For instance, 
N(2) = 3, since, in a tournament with 2 contestants, player 
1 could be uniquely first, player 2 could be uniquely first, 
or they could tie for first. 


(a) List all the possible outcomes when n = 3. 
(b) With N(0) defined to equal 1, argue, without any com- 
putations, that 


n 
n ; 
non = > ( ; ) va =i 
i=1 
Hint: How many outcomes are there in which / players tie 
for last place? 


(c) Show that the formula of part (b) is equivalent to the 
following: 
n-1 


Nin) =>* @hc 


i=0 


(d) Use the recursion to find (3) and N(4). 


17. Present a combinatorial explanation of why ( : ) 


(nn) 


18. Argue that 


n _ n= 1 
N1,N2,...,Mr 7 ny — 1,n2, +My 
n—-1 
+ + .-- 
ny,nz — 1,...,Nr 


Hint: Use an argument similar to the one used to establish 
Equation (4.1). 


19. Prove the multinomial theorem. 


“20. In how many ways can n identical balls be distributed 
into r urns so that the 7th urn contains at least m; balls, for 
each i = 1,...,r? Assume that n = )“7_, mj. 

“21. Argue that there are exactly ( : ) ( cane ) 

k n—-r+k 
solutions of 


Xj txXetes+ +X HN 


for which exactly k of the x; are equal to 0. 
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“22. Consider a function f(x1,...,xX,) of n variables. How 
many different partial derivatives of order r does f 
possess? 


Self-Test Problems and Exercises 


1. How many different linear arrangements are there of 
the letters A, B, C, D, E, F for which 


(a) A and B are next to each other? 

(b) A is before B? 

(c) A is before B and B is before C? 

(d) A is before B and C is before D? 

(e) A and B are next to each other and C and D are also 
next to each other? 

(f) E is not last in line? 


2. If 4 Americans, 3 French people, and 3 British people 
are to be seated in a row, how many seating arrangements 
are possible when people of the same nationality must sit 
next to each other? 


3. A president, treasurer, and secretary, all different, are to 
be chosen from a club consisting of 10 people. How many 
different choices of officers are possible if 


(a) there are no restrictions? 

(b) A and B will not serve together? 

(c) C and D will serve together or not at all? 
(d) E must be an officer? 

(e) F will serve only if he is president? 


4. A student is to answer 7 out of 10 questions in an exami- 
nation. How many choices has she? How many if she must 
answer at least 3 of the first 5 questions? 


5. In how many ways can a man divide 7 gifts among his 3 
children if the eldest is to receive 3 gifts and the others 2 
each? 


6. How many different 7-place license plates are possible 
when 3 of the entries are letters and 4 are digits? Assume 
that repetition of letters and numbers is allowed and that 
there is no restriction on where the letters or numbers can 
be placed. 


7. Give a combinatorial explanation of the identity 


*23. Determine the number of vectors (x1,...,X,) such that 
each x; is a nonnegative integer and 


n 
xijsk 
i=1 


8. Consider n-digit numbers where each digit is one of the 
10 integers 0,1,...,9. How many such numbers are there 
for which 


(a) no two consecutive digits are equal? 


(b) 0 appears as a digit a total of i times, i = 0,...,n? 

9. Consider three classes, each consisting of n students. 
From this group of 3m students, a group of 3 students is 
to be chosen. 


(a) How many choices are possible? 

(b) How many choices are there in which all 3 students are 
in the same class? 

(c) How many choices are there in which 2 of the 3 stu- 
dents are in the same class and the other student is in a 
different class? 

(d) How many choices are there in which all 3 students are 
in different classes? 

(e) Using the results of parts (a) through (d), write a com- 
binatorial identity. 


10. How many 5-digit numbers can be formed from the 
integers 1,2,...,9 if no digit can appear more than twice? 
(For instance, 41434 is not allowed.) 


11. From 10 married couples, we want to select a group of 
6 people that is not allowed to contain a married couple. 


(a) How many choices are there? 


(b) How many choices are there if the group must also 
consist of 3 men and 3 women? 


12. A committee of 6 people is to be chosen from a group 
consisting of 7 men and 8 women. If the committee must 
consist of at least 3 women and at least 2 men, how many 
different committees are possible? 


“13. An art collection on auction consisted of 4 Dalis, 5 van 
Goghs, and 6 Picassos. At the auction were 5 art collectors. 
If a reporter noted only the number of Dalis, van Goghs, 
and Picassos acquired by each collector, how many differ- 
ent results could have been recorded if all of the works 
were sold? 


“14. Determine the number of vectors (x1,.. 
each x; is a positive integer and 


.,Xy) such that 


n 
> x =k 
i=1 


where k = n. 


15. A total of m students are enrolled in a review 
course for the actuarial examination in probability. The 
posted results of the examination will list the names of 
those who passed, in decreasing order of their scores. 
For instance, the posted result will be “Brown, Cho” 
if Brown and Cho are the only ones to pass, with 
Brown receiving the higher score. Assuming that all 
scores are distinct (no ties), how many posted results are 
possible? 


16. How many subsets of size 4 of the set S = {1,2,...,20} 
contain at least one of the elements 1,2,3,4,5? 


17. Give an analytic verification of 


n k n—k 
(5)=(,) + ke - © + ( 3 ), 1l=k<sn 


Now, give a combinatorial argument for this identity. 
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18. In a certain community, there are 3 families consisting 
of a single parent and 1 child, 3 families consisting of a sin- 
gle parent and 2 children, 5 families consisting of 2 parents 
and a single child, 7 families consisting of 2 parents and 2 
children, and 6 families consisting of 2 parents and 3 chil- 
dren. If a parent and child from the same family are to be 
chosen, how many possible choices are there? 


19. If there are no restrictions on where the digits and let- 
ters are placed, how many 8-place license plates consisting 
of 5 letters and 3 digits are possible if no repetitions of 
letters or digits are allowed? What if the 3 digits must be 
consecutive? 


20. Verify the identity 


x 
X1!x2!---x;! 


Xy+..4x,;=n, xj20 


(a) by a combinatorial argument that first notes that r” 
is the number of different n letter sequences that can be 
formed from an alphabet consisting of r letters, and then 
determines how many of these letter sequences have let- 
ter 1 a total of x; times and letter 2 a total of x2 times and 
...and letter r a total of x, times; 


(b) by using the multinomial theorem. 


21. Simplifyn — (5) + (§) — ... # Gpntt(’) 
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2.1 Introduction 


In this chapter, we introduce the concept of the probability of an event and then 
show how probabilities can be computed in certain situations. As a preliminary, 
however, we need to discuss the concept of the sample space and the events of an 
experiment. 


2.2 Sample Space and Events 


Consider an experiment whose outcome is not predictable with certainty. However, 
although the outcome of the experiment will not be known in advance, let us suppose 
that the set of all possible outcomes is known. This set of all possible outcomes of 
an experiment is known as the sample space of the experiment and is denoted by S. 
Following are some examples: 


1. If the outcome of an experiment consists of the determination of the sex of a 
newborn child, then 


S = {g, b} 


where the outcome g means that the child is a girl and b that it is a boy. 


2. If the outcome of an experiment is the order of finish in a race among the 7 
horses having post positions 1, 2, 3, 4, 5, 6, and 7, then 


S = {all 7! permutations of (1,2,3,4,5, 6, 7)} 


The outcome (2, 3, 1, 6, 5, 4, 7) means, for instance, that the number 2 horse 
comes in first, then the number 3 horse, then the number 1 horse, and so on. 
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3. If the experiment consists of flipping two coins, then the sample space consists 
of the following four points: 


S= {(A,h),(h,0, (64), 60} 


The outcome will be (A, /) if both coins are heads, (h, ft) if the first coin is heads 
and the second tails, (t, 4) if the first is tails and the second heads, and (¢, f) if 
both coins are tails. 
4. If the experiment consists of tossing two dice, then the sample space consists 
of the 36 points 
S={@, j:i, j= 1, 2, 3, 4, 5, 6} 


where the outcome (i, /) is said to occur if i appears on the leftmost die and j 
on the other die. 

5. If the experiment consists of measuring (in hours) the lifetime of a transistor, 
then the sample space consists of all nonnegative real numbers; that is, 


S= {x:0 = x < oo} 


Any subset E of the sample space is known as an event. In other words, an event 
is a set consisting of possible outcomes of the experiment. If the outcome of the 
experiment is contained in E, then we say that E has occurred. Following are some 
examples of events. 

In the preceding Example 1, if E = {g}, then E is the event that the child is a 
girl. Similarly, if F = {5}, then F is the event that the child is a boy. 

In Example 2, if 


E = {all outcomes in S starting with a 3} 


then E is the event that horse 3 wins the race. 

In Example 3, if EF = {(4,h), (4, 0}, then E is the event that a head appears on 
the first coin. 

In Example 4, if E = {(1,6), (2,5), (3,4), (4,3), (5,2), (6, 1)}, then E is the event 
that the sum of the dice equals 7 

In Example 5, if EF = {x:0 = x = 5}, then F is the event that the transistor does 
not last longer than 5 hours. 

For any two events F and F of a sample space S, we define the new event EF U F 
to consist of all outcomes that are either in Z or in F or in both E and F. That is, the 
event EF U F will occur if either E or F occurs. For instance, in Example 1, if E = {g} 
is the event that the child is a girl and F = {b} is the event that the child is a boy, 
then 

EU F= {g,b} 


is the whole sample space S. In Example 3, if E = {(h,h), (h, t)} is the event that the 
first coin lands heads, and F = {(t,h), (h,h)} is the event that the second coin lands 
heads, then 

EU F={(h,h), (h,0), (t,h)} 


is the event that at least one of the coins lands heads and thus will occur provided 
that both coins do not land tails. 

The event EF U Fis called the union of the event E and the event F. 

Similarly, for any two events E and F, we may also define the new event EF, 
called the intersection of E and F, to consist of all outcomes that are both in FE and 
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in F. That is, the event EF (sometimes written E  F) will occur only if both E and F 
occur. For instance, in Example 3, if E = {(A,h), (A, 1), (t,h)} is the event that at least 
1 head occurs and F = {(h, £), (t,h), (t, t)} is the event that at least 1 tail occurs, then 


EF = {(h,0),()} 


is the event that exactly 1 head and 1 tail occur. In Example 4, if E = {(1, 6), (2,5), 
(3,4), (4,3), (5,2), (6, 1)} is the event that the sum of the dice is 7 and F = {(1,5), (2,4), 
(3, 3), (4, 2), (5, 1)} is the event that the sum is 6, then the event EF does not contain 
any outcomes and hence could not occur. To give such an event a name, we shall refer 
to it as the null event and denote it by @. (That is, @ refers to the event consisting of 
no outcomes.) If EF = @, then E and F are said to be mutually exclusive. 

We define unions and intersections of more than two events in a Similar manner. 
If £), Ex,... are events, then the union of these events, denoted by (J En, is defined 


n= 
to be that event that consists of all outcomes that are in E,, for at least one value 


[o-@) 
of n = 1,2,.... Similarly, the intersection of the events E,, denoted by (| En, is 


defined to be the event consisting of those outcomes that are in all of the events 
En,n = 1,2,.... 

Finally, for any event E, we define the new event E°, referred to as the com- 
plement of E, to consist of all outcomes in the sample space S that are not in E. 
That is, E° will occur if and only if E does not occur. In Example 4, if event EF = 
{(1, 6), (2,5), (3,4), (4,3), (5,2), (6,1)}, then E° will occur when the sum of the dice 
does not equal 7 Note that because the experiment must result in some outcome, it 
follows that S° = @. 

For any two events EF and F, if all of the outcomes in £ are also in F, then we 
say that E is contained in F, or E is a subset of F, and write E C F (or equivalently, 
F > E, which we sometimes say as F is a superset of E). Thus, if E C F, then the 
occurrence of FE implies the occurrence of F. If E C F and F C E, we say that E 
and F are equal and write E = F. 

A graphical representation that is useful for illustrating logical relations among 
events is the Venn diagram. The sample space S is represented as consisting of 
all the outcomes in a large rectangle, and the events E,F,G,... are represented 
as consisting of all the outcomes in given circles within the rectangle. Events of 
interest can then be indicated by shading appropriate regions of the diagram. For 
instance, in the three Venn diagrams shown in Figure 2.1, the shaded areas represent, 
respectively, the events EF U F, EF, and E°. The Venn diagram in Figure 2.2 indicates 
that E C F. 

The operations of forming unions, intersections, and complements of events 
obey certain rules similar to the rules of algebra. We list a few of these rules: 


Commutative laws EUF=FUE EF = FE 
Associative laws (EUF)UG=EU(FUG) (EF)G= E(FG) 
Distributive laws (EUF)G = EGUFG EFUG= (EUG)(FUG) 


These relations are verified by showing that any outcome that is contained in the 
event on the left side of the equality sign is also contained in the event on the 
right side, and vice versa. One way of showing this is by means of Venn diagrams. 
For instance, the distributive law may be verified by the sequence of diagrams in 
Figure 2.3. 
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(a) Shaded region: E U F. (b) Shaded region: EF. 


S 


(c) Shaded region: E°. 


Figure 2.1 Venn diagrams. 


Figure 2.2 E C F. 


E F E F 


G G 
(a) Shaded region: EG. (b) Shaded region: FG. 


E F 


G 
(c) Shaded region: (E U F)G. 


Figure 2.3 (EUF)G= EG U FG. 
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The following useful relationships among the three basic operations of forming 
unions, intersections, and complements are known as DeMorgan’s laws: 


n . n 
Ue) An 
i=1 i=1 

n 


Cc 
n 
A\n) Ue 
i=1 


i=1 
For instance, for two events E and F, DeMorgan’s laws state that 
(E U F)/S=ESF® and (EF)* = ES U FF 


which can be easily proven by using Venn diagrams (see Theoretical Exercise 7). 
To prove DeMorgan’s laws for general n, suppose first that x is an outcome of 
Cc 


n n 
) E;) . Then x is not contained in |) £;, which means that x is not contained 

i=1 i=1 
in any of the events £;,i = 1,2,...,n, implying that x is contained in Ef for all 


n 
_ : : . a : 
i= 1,2,...,n and thus is contained in a E*. To go the other way, suppose that x is 
i= 
n 
an outcome of () ES. Then x is contained in EF for all i = 1,2,...,n, which means 
i=1 
that x is not contained in £; for any i = 1,2,...,n, implying that x is not contained 


Cc 
n n 
in |) Ej, in turn implying that x is contained in { U «i . This proves the first of 
i 1 
DeMorgan’s laws. 


To prove the second of DeMorgan’s laws, we use the first law to obtain 


c 
n n 


Le) =( \e 


Taking complements of both sides of the preceding equation yields the result we 


seek, namely, 
Cc 
n 


n 
Us=(Ne 
1 


1 


2.3 Axioms of Probability 


One way of defining the probability of an event is in terms of its long run relative 
frequency. Such a definition usually goes as follows: We suppose that an experiment, 
whose sample space is S, is repeatedly performed under exactly the same conditions. 
For each event E of the sample space S, we define n(£) to be the number of times 
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in the first n repetitions of the experiment that the event F occurs. Then P(E), the 
probability of the event E, is defined as 


That is, P(E) is defined as the (limiting) proportion of time that E occurs. It is thus 
the limiting relative frequency of E. 

Although the preceding definition is certainly intuitively pleasing and should 
always be kept in mind by the reader, it possesses a serious drawback: How do we 
know that n(£)/n will converge to some constant limiting value that will be the same 
for each possible sequence of repetitions of the experiment? For example, suppose 
that the experiment to be repeatedly performed consists of flipping a coin. How do 
we know that the proportion of heads obtained in the first n flips will converge to 
some value as n gets large? Also, even if it does converge to some value, how do we 
know that, if the experiment is repeatedly performed a second time, we shall obtain 
the same limiting proportion of heads? 

Proponents of the relative frequency definition of probability usually answer 
this objection by stating that the convergence of n(£)/n to a constant limiting value 
is an assumption, or an axiom, of the system. However, to assume that n(£)/n will 
necessarily converge to some constant value seems to be an extraordinarily compli- 
cated assumption. For, although we might indeed hope that such a constant limiting 
frequency exists, it does not at all seem to be a priori evident that this need be the 
case. In fact, would it not be more reasonable to assume a set of simpler and more 
self-evident axioms about probability and then attempt to prove that such a con- 
stant limiting frequency does in some sense exist? The latter approach is the modern 
axiomatic approach to probability theory that we shall adopt in this text. In partic- 
ular, we shall assume that, for each event E in the sample space S, there exists a 
value P(E), referred to as the probability of E. We shall then assume that all these 
probabilities satisfy a certain set of axioms, which, we hope the reader will agree, is 
in accordance with our intuitive notion of probability. 

Consider an experiment whose sample space is S. For each event E of the sample 
space S, we assume that a number P(£) is defined and satisfies the following three 
axioms: 


The three axioms of probability 


Axiom 1 
0=<= P(E) =1 
Axiom 2 
P(S) =1 
Axiom 3 
For any sequence of mutually exclusive events E1, £2,... (that is, events for 


which E;E; = © when i ¥ j), 
(oe) [oe) 
PIU | =>) Pe 
i=1 i=1 


We refer to P(E) as the probability of the event E. 
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Example 
3a 


Example 
3b 


Thus, Axiom 1 states that the probability that the outcome of the experiment is 
an outcome in E is some number between 0 and 1. Axiom 2 states that, with proba- 
bility 1, the outcome will be a point in the sample space S. Axiom 3 states that, for 
any sequence of mutually exclusive events, the probability of at least one of these 
events occurring is just the sum of their respective probabilities. 


If we consider a sequence of events FE), £2,..., where Ej = S and E; = @ for 
[o<) 


i > 1, then, because the events are mutually exclusive and because S$ = J Ej, we 
i=1 
have, from Axiom 3, 


P(S) = > P(E) = P(S) + Y> P@) 


i=1 i=2 


implying that 
P(®) =0 


That is, the null event has probability 0 of occurring. 
Note that it follows that, for any finite sequence of mutually exclusive events F1, 
EG se aingEins 


Pil Je |=> Pap (3.1) 
1 i=1 


This equation follows from Axiom 3 by defining £; as the null event for all values 
of i greater than n. Axiom 3 is equivalent to Equation (3.1) when the sample space 
is finite. (Why?) However, the added generality of Axiom 3 is necessary when the 
sample space consists of an infinite number of points. 


If our experiment consists of tossing a coin and if we assume that a head is as likely 
to appear as a tail, then we would have 


1 
Pe) =) 


On the other hand, if the coin were biased and we believed that a head were twice 
as likely to appear as a tail, then we would have 


2 1 
P({H}) = 3 PCT) = 3 a 


If a die is rolled and we suppose that all six sides are equally likely to appear, then 
we would have P({1}) = P({2}) = P({3}) = P({4}) = Pd5})) = P({6}) = é From 
Axiom 3, it would thus follow that the probability of rolling an even number would 
equal 


1 
P({2,4,6}) = P({2}) + P({4}) + P({6}) = 5 A 


The assumption of the existence of a set function P, defined on the events of 
a sample space S and satisfying Axioms 1, 2, and 3, constitutes the modern math- 
ematical approach to probability theory. It is hoped that the reader will agree that 
the axioms are natural and in accordance with our intuitive concept of probability as 
related to chance and randomness. Furthermore, using these axioms, we shall be able 
to prove that if an experiment is repeated over and over again, then, with probability 
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1, the proportion of time during which any specific event E occurs will equal P(E). 
This result, known as the strong law of large numbers, is presented in Chapter 8. 
In addition, we present another possible interpretation of probability—as being a 
measure of belief—in Section 2.7. 


Technical Remark. We have supposed that P(£) is defined for all the events E 
of the sample space. Actually, when the sample space is an uncountably infinite set, 
P(E) is defined only for a class of events called measurable. However, this restriction 
need not concern us, as all events of any practical interest are measurable. 


2.4 Some Simple Propositions 


Proposition 
4.1 


Proposition 
4.2 


Proposition 
4.3 


In this section, we prove some simple propositions regarding probabilities. We first 
note that since FE and E* are always mutually exclusive and since E U E° = S, we 
have, by Axioms 2 and 3, 


1=P(S)= P(E U E°)= P(E) + P(E‘) 
Or, equivalently, we have Proposition 4.1. 
P(E‘) =1 — P(E) 


In words, Proposition 4.1 states that the probability that an event does not occur 
is 1 minus the probability that it does occur. For instance, if the probability of obtain- 
ing a head on the toss of a coin is 3, then the probability of obtaining a tail must be 2. 


Our second proposition states that if the event E is contained in the event F, 
then the probability of E is no greater than the probability of F. 


IfE C F,then P(E) = P(F). 
Proof Since E C F, it follows that we can express F as 
F=EU EF 
Hence, because FE and E‘F are mutually exclusive, we obtain, from Axiom 3, 


P(F) = P(E) + P(E‘F) 


which proves the result, since P(ESF) = 0. 


Proposition 4.2 tells us, for instance, that the probability of rolling a 1 with a die 
is less than or equal to the probability of rolling an odd value with the die. 

The next proposition gives the relationship between the probability of the union 
of two events, expressed in terms of the individual probabilities, and the probability 
of the intersection of the events. 


P(E U F) = P(E) + P(F) — P(EF) 


Proof To derive a formula for P(E U F), we first note that E U F can be written as 
the union of the two disjoint events E and E‘F. Thus, from Axiom 3, we obtain 


P(E U F)= P(E U EF) 
— P(E) + P(E‘F) 
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Furthermore, since F = EF U E*‘F, we again obtain from Axiom 3 


P(F) = P(EF) + P(E‘F) 


or, equivalently, 
P(E‘F) = P(F) — P(EF) 


thereby completing the proof. 


Proposition 4.3 could also have been proved by making use of the Venn diagram 
in Figure 2.4. 

Let us divide E U F into three mutually exclusive sections, as shown in Figure 2.5. 
In words, section I represents all the points in E that are not in F (that is, EF), 
section II represents all points both in E and in F (that is, EF), and section III rep- 
resents all points in F that are not in EF (that is, E°F). 

From Figure 2.5, we see that 

EUF=I1UNUI 
E=1UII 


F=II u Ill 
As I, II, and IJ are mutually exclusive, it follows from Axiom 3 that 
P(E U F)= PQ) + PddD + PdID 
P(E) = P() + Pdbd 
P(F) = Pd) + PdID 


which shows that 
P(E U F)=P(E£) + P(F) — PdD 


and Proposition 4.3 is proved, since II = EF. 


ty 
yy 


Figure 2.4 Venn diagram. 


’ 


Figure 2.5 Venn diagram in sections. 
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J is taking two books along on her holiday vacation. With probability .5, she will like 
the first book; with probability .4, she will like the second book; and with probabil- 
ity .3, she will like both books. What is the probability that she likes neither book? 


Solution Let B; denote the event that J likes book i,i = 1,2. Then the probability 
that she likes at least one of the books is 


P(B, U Bz) = P(B1) + P(B2) — P(B1B2) = 5+ 4- 3=.6 


Because the event that J likes neither book is the complement of the event that she 
likes at least one of them, we obtain the result 


P(B{ BS) = P((Bi U B2)°) =1 — P(B, U By) = 4 a 


We may also calculate the probability that any one of the three events E, F, and 
G occurs, namely, 
PEUFUG=P[(E U F) UG 


which, by Proposition 4.3, equals 
P(E U F) + P(G) — P[(E U F)G] 


Now, it follows from the distributive law that the events (E U F)Gand EG U FG 
are equivalent; hence, from the preceding equations, we obtain 


P(EUFUG) 
= P(E) + P(F) — P(EF) + P(G) — P(EG U FG) 
= P(E) + P(F) — P(EF) + P(G) — P(EG) — P(FG) + P(EGFG) 
= P(E) + P(F) + P(G) — P(EF) — P(EG) — P(FG) + P(EFG) 


In fact, the following proposition, known as the inclusion—exclusion identity, can 
be proved by mathematical induction: 


n 
P(E, U Ey U +» U En) =) PBI) — 7 P(E; Ein) + 
f=] I< 
+(-D1 SO P(E; Ei, ++ Ei,) 
ly <in<:+:<iy 


de sive Se (1 PB, Boe os By) 


The summation Y P(E;, Ei, --- E;j,) is taken over all of the ( possible sub- 
ly <in<:+:<i, 
sets of size r of the set {1,2,..., 7}. 


In words, Proposition 4.4 states that the probability of the union of n events 
equals the sum of the probabilities of these events taken one at a time, minus the 
sum of the probabilities of these events taken two at a time, plus the sum of the 
probabilities of these events taken three at a time, and so on. 


Remarks 1. For a noninductive argument for Proposition 4.4, note first that if an 
outcome of the sample space is not a member of any of the sets E;, then its probabil- 
ity does not contribute anything to either side of the equality. Now, suppose that an 


outcome is in exactly m of the events E;, where m > 0. Then, since it is in L) Ej, its 
i 
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probability is counted once in P (VU ni: also, as this outcome is contained in ( ” 


L 
subsets of the type Ej, E;, --- E;,, its probability is counted 


ix? 


()-(2) (8) Cs) 


times on the right of the equality sign in Proposition 4.4. Thus, form > 0, we must 


caer ra(8) (2) «(ph e() 


However, since 1 = ( 6 ) the preceding equation is equivalent to 


E(t)or 


i=0 
and the latter equation follows from the binomial theorem, since 
m oe 
— (_ =, _1)\i m-i 
0=(-1+)) -( ) bi) 
i= 


2. The following is a succinct way of writing the inclusion—exclusion identity: 


PULE) =>" DD PE, E) 


r=1 I<: <iy 


3. In the inclusion—exclusion identity, going out one term results in an upper 
bound on the probability of the union, going out two terms results in a lower bound 
on the probability, going out three terms results in an upper bound on the proba- 
bility, going out four terms results in a lower bound, and so on. That is, for events 


E\,..., En, we have 

n 

PUL,ED) =) PED (4.1) 
i=1 
n 

PUL, E) = >> P(E) — > PEE) (4.2) 
i=1 j<i 
n 

PUL, E) = P(E) — >> PE) + > PEER) (4.3) 
i=1 j<i k<j<i 


and so on. To prove the validity of these bounds, note the identity 
Ut Ei = EF; VU Ey Er U EY E5E3 U--+U EY . -ED En 


That is, at least one of the events £; occurs if F; occurs, or if E; does not occur but 
E> does, or if FE; and E> do not occur but £3 does, and so on. Because the right-hand 
side is the union of disjoint events, we obtain 
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PUL, E)) = P(A) + PUEVE2) + PUE{ESEs) +... + P(EY--- Ef_1 En) 


= P(E}) + >> P(Ej--- Ef, Ei) (4.4) 
i=2 


Now, let B; = E{---Ef_, = (Uj<ik;)° be the event that none of the first i — 1 
events occurs. Applying the identity 


P(E;) = P(BiEi) + P(B; Ei) 


shows that 
P(E;) = P(E: + EEE) + P(E; Uji Ej) 


or, equivalently, 


Substituting this equation into (4.4) yields 
PUL Ei) =) P(E) — >) PUjiEiE;) (4.5) 
i i 
Because probabilities are always nonnegative, Inequality (4.1) follows directly from 
Equation (4.5). Now, fixing 7 and applying Inequality (4.1) to P(Uj<;E;£;) yields 


PUj<iE:E)) =) PUE:E}) 


j<i 


which, by Equation (4.5), gives Inequality (4.2). Similarly, fixing i and applying 
Inequality (4.2) to P(U;<;E;E;) yields 


j<i k<j<i 
=) P(E:E)) — >) P(EE;Ex) 
j<i k<j<i 


which, by Equation (4.5), gives Inequality (4.3). The next inclusion—exclusion 
inequality is now obtained by fixing 7 and applying Inequality (4.3) to P(Uj<iE;E;j), 
and so on. 

The first inclusion-exclusion inequality, namely that 


n 
PUL,E) = > PE) 
i=1 


is known as Boole’s inequality. 


2.5 Sample Spaces Having Equally Likely Outcomes 


In many experiments, it is natural to assume that all outcomes in the sample space 
are equally likely to occur. That is, consider an experiment whose sample space S is 
a finite set, say, S = {1,2,...,N}. Then, it is often natural to assume that 


P({1}) = P(2}) = --- = PUN) 
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which implies, from Axioms 2 and 3 (why?), that 
P({i}) S. $1 
Yea PHT 2 eas 
N 


From this equation, it follows from Axiom 3 that, for any event EF, 


number of outcomes in E 


P(E) = 
number of outcomes in S 
In words, if we assume that all outcomes of an experiment are equally likely to occur, 
then the probability of any event E equals the proportion of outcomes in the sample 
space that are contained in E. 


If two dice are rolled, what is the probability that the sum of the upturned faces will 
equal 7? 


Solution We shall solve this problem under the assumption that all of the 36 possible 
outcomes are equally likely. Since there are 6 possible outcomes—namelly, (1, 6), 
(2, 5), (3, 4), (4, 3), (5, 2), and (6, 1)—that result in the sum of the dice being equal 
to 7 the desired probability is ze = é O 


If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black balls, 
what is the probability that one of the balls is white and the other two black? 


Solution If we regard the balls as being distinguishable and the order in which they 
are selected as being relevant, then the sample space consists of 11 - 10 - 9 = 990 
outcomes. Furthermore, there are 6 - 5 - 4 = 120 outcomes in which the first ball 
selected is white and the other two are black; 5 - 6 - 4 = 120 outcomes in which 
the first is black, the second is white, and the third is black; and 5 - 4 - 6 = 120in 
which the first two are black and the third is white. Hence, assuming that “randomly 
drawn” means that each outcome in the sample space is equally likely to occur, we 
see that the desired probability is 


120+ 120+120 4 
990 ~ 11 


This problem could also have been solved by regarding the outcome of the 
experiment as the unordered set of drawn balls. From this point of view, there are 
11 

3 


to 3! outcomes when the order of selection is noted. As a result, if all outcomes 
are assumed equally likely when the order of selection is noted, then it follows that 
they remain equally likely when the outcome is taken to be the unordered set of 
selected balls. Hence, using the latter representation of the experiment, we see that 
the desired probability is 


= 165 outcomes in the sample space. Now, each set of 3 balls corresponds 


which, of course, agrees with the answer obtained previously. | 


Example 
5c 


Example 
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When the experiment consists of a random selection of k items from a set of 
items, we have the flexibility of either letting the outcome of the experiment be the 
ordered selection of the k items or letting it be the unordered set of items selected. 
In the former case, we would assume that each new selection is equally likely to be 
any of the so far unselected items of the set, and in the latter case, we would assume 
that all (72) possible subsets of & items are equally likely to be the set selected. For 
instance, suppose 5 people are to be randomly selected from a group of 20 individu- 
als consisting of 10 married couples, and we want to determine P(N), the probability 
that the 5 chosen are all unrelated. (That is, no two are married to each other.) 
If we regard the sample space as the set of 5 people chosen, then there are (2°) 
equally likely outcomes. An outcome that does not contain a married couple can 
be thought of as being the result of a six-stage experiment: In the first stage, 5 of 
the 10 couples to have a member in the group are chosen; in the next 5 stages, 1 
of the 2 members of each of these couples is selected. Thus, there are (22 possi- 
ble outcomes in which the 5 members selected are unrelated, yielding the desired 
probability of 


In contrast, we could let the outcome of the experiment be the ordered selection 
of the 5 individuals. In this setting, there are 20 - 19 - 18 - 17 - 16 equally likely 
outcomes, of which 20 - 18 - 16 - 14 - 12 outcomes result in a group of 5 unrelated 
individuals, yielding the result 


20-18 - 16-14-12 


P(N) = 
mw 20 - 19-18-17. 16 


We leave it for the reader to verify that the two answers are identical. 


A committee of 5 is to be selected from a group of 6 men and 9 women. If the 
selection is made randomly, what is the probability that the committee consists of 3 
men and 2 women? 


Solution Because each of the (2) possible committees is equally likely to be selected, 


the desired probability is 


An urn contains n balls, one of which is special. If k of these balls are withdrawn one 
at a time, with each selection being equally likely to be any of the balls that remain 
at the time, what is the probability that the special ball is chosen? 
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Solution Since all of the balls are treated in an identical manner, it follows that the 
n 


k sets of k balls. Therefore, 


set of k balls selected is equally likely to be any of the 


P{special ball is selected} = = 


We could also have obtained this result by letting A; denote the event that the special 
ball is the ith ball to be chosen, i = 1,...,k. Then, since each one of the n balls is 
equally likely to be the ith ball chosen, it follows that P(A;) = 1/n. Hence, because 
these events are clearly mutually exclusive, we have 


k k 
k 
P ial ball is selected} = P Ai} = >> P(A) = — 
{special ball is selected} U F 2 (Aj) A 


We could also have argued that P(A;) = 1/n, by noting that there are n(n — 1)--- 

(n — k + 1) =a!/(m — k)! equally likely outcomes of the experiment, of which 

(n—-1)(n—-2)---M—i+)QDMm—-d)---a—k4+1)=(—-1)!/( — k)! result 

in the special ball being the ith one chosen. From this reasoning, it follows that 
(n— 1)! 1 


Pe, = 


Suppose that n + m balls, of which n are red and m are blue, are arranged in a linear 
order in such a way that all (1 + m)! possible orderings are equally likely. If we 
record the result of this experiment by listing only the colors of the successive balls, 
show that all the possible results remain equally likely. 


Solution Consider any one of the (n + m)! possible orderings, and note that any per- 
mutation of the red balls among themselves and of the blue balls among themselves 
does not change the sequence of colors. As a result, every ordering of colorings cor- 
responds to n! m! different orderings of the n + m balls, so every ordering of the 
colors has probability fe of occurring. 

For example, suppose that there are 2 red balls, numbered 71, r2, and 2 blue balls, 
numbered by, bz. Then, of the 4! possible orderings, there will be 2! 2! orderings that 
result in any specified color combination. For instance, the following orderings result 


in the successive balls alternating in color, with a red ball first: 
r1,b1,12,b2 11,b2,72,b1 12,b1,11,b2 12,b2,n,b1 


Therefore, each of the possible orderings of the colors has probability sn = , of 
occurring. | 


A poker hand consists of 5 cards. If the cards have distinct consecutive values and 
are not all of the same suit, we say that the hand is a straight. For instance, a hand 
consisting of the five of spades, six of spades, seven of spades, eight of spades, and 
nine of hearts is a straight. What is the probability that one is dealt a straight? 


Example 
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Solution We start by assuming that all (3) possible poker hands are equally 


likely. To determine the number of outcomes that are straights, let us first deter- 
mine the number of possible outcomes for which the poker hand consists of an ace, 
two, three, four, and five (the suits being irrelevant). Since the ace can be any 1 of the 
4 possible aces, and similarly for the two, three, four, and five, it follows that there 
are 4° outcomes leading to exactly one ace, two, three, four, and five. Hence, since 
in 4 of these outcomes all the cards will be of the same suit (such a hand is called a 
straight flush), it follows that there are 4° — 4 hands that make up a straight of the 
form ace, two, three, four, and five. Similarly, there are 4° — 4 hands that make upa 
straight of the form ten, jack, queen, king, and ace. Thus, there are 10(4° — 4) hands 
that are straights, and it follows that the desired probability is 


10(4° — 4) 

52 

5 
A 5-card poker hand is said to be a full house if it consists of 3 cards of the same 
denomination and 2 other cards of the same denomination (of course, different from 


the first denomination). Thus, a full house is three of a kind plus a pair. What is the 
probability that one is dealt a full house? 


= .0039 a 


Solution Again, we assume that all >) possible hands are equally likely. To 


4 4 

2 3 
different combinations of, say, 2 tens and 3 jacks. Because there are 13 different 
choices for the kind of pair and, after a pair has been chosen, there are 12 other 
choices for the denomination of the remaining 3 cards, it follows that the probability 
of a full house is 


determine the number of possible full houses, we first note that there are 


In the game of bridge, the entire deck of 52 cards is dealt out to 4 players. What is 
the probability that 


(a) one of the players receives all 13 spades; 


(b) each player receives 1 ace? 


Solution (a) Letting £; be the event that hand i has all 13 spades, then 


1 : 
P(E) = an i=1944 
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Because the events E;, i = 1,2,3,4, are mutually exclusive, the probability that one 
of the hands is dealt all 13 spades is 


4 
52 
4 ~ 12 
PUL, ED = yore = u(%) Paes 
= 


(b) Let the outcome of the experiment be the sets of 13 cards of each of the 
players 1, 2, 3, 4. To determine the number of outcomes in which each of the dis- 
tinct players receives exactly 1 ace, put aside the aces and note that there are 

48 
12,12,12,12 
receive 12. Because there are 4! ways of dividing the 4 aces so that each player 
receives 1, we see that the number of possible outcomes in which each player receives 

48 
12,12,12,12 } 


As there are (,3 ea) possible hands, the desired probability is thus 


possible divisions of the other 48 cards when each player is to 
exactly 1 ace is 4! 


(19.19.1212) ~ 1055 a 


5 
(131343.3) 


Some results in probability are quite surprising when initially encountered. Our 
next two examples illustrate this phenomenon. 


If m people are present in a room, what is the probability that no two of them cele- 
brate their birthday on the same day of the year? How large need v be so that this 
probability is less than 5? 


Solution As each person can celebrate his or her birthday on any one of 365 days, 
there are a total of (365)” possible outcomes. (We are ignoring the possibility of 
someone having been born on February 29.) Assuming that each outcome is equally 
likely, we see that the desired probability is (365) (364) (363) ... (365 — n + 1)/(365)”. 
It is a rather surprising fact that when 1 = 23, this probability is less than 5 That is, if 
there are 23 or more people in a room, then the probability that at least two of them 
have the same birthday exceeds }. Many people are initially surprised by this result, 
since 23 seems so small in relation to 365, the number of days of the year. However, 


1 
= ~~ of having the same birthday, 


365 
every pair of individuals has probability Go5)2 365 


= 253 different pairs of individuals. 
Looked at this way, the result no longer seems so surprising. 

When there are 50 people in the room, the probability that at least two share the 
same birthday is approximately .970, and with 100 persons in the room, the odds are 


3x 10° 
better than 3,000,000:1. (That is, the probability is greater than ————_—— that at 
3 x 10° + 1 


and in a group of 23 people, there are 


least two people have the same birthday.) 


A deck of 52 playing cards is shuffled, and the cards are turned up one at a time until 
the first ace appears. Is the next card—that is, the card following the first ace —more 
likely to be the ace of spades or the two of clubs? 


Example 
5k 
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Solution To determine the probability that the card following the first ace is the 
ace of spades, we need to calculate how many of the (52)! possible orderings of the 
cards have the ace of spades immediately following the first ace. To begin, note that 
each ordering of the 52 cards can be obtained by first ordering the 51 cards different 
from the ace of spades and then inserting the ace of spades into that ordering. Fur- 
thermore, for each of the (51)! orderings of the other cards, there is only one place 
where the ace of spades can be placed so that it follows the first ace. For instance, if 
the ordering of the other 51 cards is 


4c, 6h, Jd, 5s, Ac, 7d,..., Kh 


then the only insertion of the ace of spades into this ordering that results in its fol- 
lowing the first ace is 


4c, 6h, Jd, 5s, Ac, As, 7d,...,Kh 


Therefore, there are (51)! orderings that result in the ace of spades following the first 
ace, SO 
(51)! 1 


P{the ace of spades follows the first ace} = —— = — 
(52)! 52 


In fact, by exactly the same argument, it follows that the probability that the two 
of clubs (or any other specified card) follows the first ace is also a: In other words, 
each of the 52 cards of the deck is equally likely to be the one that follows the first 
ace! 

Many people find this result rather surprising. Indeed, a common reaction is to 
suppose initially that it is more likely that the two of clubs (rather than the ace of 
spades) follows the first ace, since that first ace might itself be the ace of spades. This 
reaction is often followed by the realization that the two of clubs might itself appear 
before the first ace, thus negating its chance of immediately following the first ace. 
However, as there is one chance in four that the ace of spades will be the first ace 
(because all 4 aces are equally likely to be first) and only one chance in five that 
the two of clubs will appear before the first ace (because each of the set of 5 cards 
consisting of the two of clubs and the 4 aces is equally likely to be the first of this set 
to appear), it again appears that the two of clubs is more likely. However, this is not 
the case, and our more complete analysis shows that they are equally likely. a 


A football team consists of 20 offensive and 20 defensive players. The players are to 
be paired in groups of 2 for the purpose of determining roommates. If the pairing is 
done at random, what is the probability that there are no offensive-defensive room- 
mate pairs? What is the probability that there are 2i offensive-defensive roommate 


pairs, i= 1,2,...,10? 
40 _ (40)! 
D2issege (tee 


ways of dividing the 40 players into 20 ordered pairs of two each. (That is, there 
are (40)!/27° ways of dividing the players into a first pair, a second pair, and so on.) 
Hence, there are (40)!/279(20)! ways of dividing the players into (unordered) pairs of 
2 each. Furthermore, since a division will result in no offensive—defensive pairs if the 
offensive (and defensive) players are paired among themselves, it follows that there 


Solution There are 
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are [(20)!/2!9(10)!]* such divisions. Hence, the probability of no offensive—defensive 
roommate pairs, call it Po, is given by 


(20)! \? 
(sami) [(20)!}° 


0 (40)! ‘[10)1]2(40)! 
220(20)! 


To determine P;, the probability that there are 27 offensive—defensive pairs, we first 


2i 
sive players who are to be in the offensive—defensive pairs. These 47 players can then 
be paired up into (27)! possible offensive—defensive pairs. (This is so because the 
first offensive player can be paired with any of the 2i defensive players, the second 
offensive player with any of the remaining 2i — 1 defensive players, and so on.) 
As the remaining 20 — 2i offensive (and defensive) players must be paired among 
themselves, it follows that there are 


eh 
20\ 4.1 Co— 201. 7 
& Gi! = = = 


divisions that lead to 2i offensive—defensive pairs. Hence, 


2 cy. oa 
20 (20 — 23)! 
2) i Farr = =i 


Pee ay P01. 50,10 


220(20)! 


note that there are ( a ways of selecting the 27 offensive players and the 2: defen- 


The P2;,i = 0,1,...,10, can now be computed, or they can be approximated by 
making use of a result of Stirling, which shows that n! can be approximated by 
n"*1/2e—-"./Ix. For instance, we obtain 


Po © 1.3403 x 10~° 
Pio © 345861 
Px) © 7.6068 X 10~° | 


2 


ru 


Our next three examples illustrate the usefulness of the inclusion—exclusion iden- 
tity (Proposition 4.4). In Example 51, the introduction of probability enables us to 
obtain a quick solution to a counting problem. 


A total of 36 members of a club play tennis, 28 play squash, and 18 play badminton. 
Furthermore, 22 of the members play both tennis and squash, 12 play both tennis 
and badminton, 9 play both squash and badminton, and 4 play all three sports. How 
many members of this club play at least one of three sports? 


Solution Let N denote the number of members of the club, and introduce probabil- 
ity by assuming that a member of the club is randomly selected. If, for any subset C 
of members of the club, we let P(C) denote the probability that the selected member 
is contained in C, then 


number of members in C 


P(C)= N 
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Now, with T being the set of members that plays tennis, S being the set that plays 
squash, and B being the set that plays badminton, we have, from Proposition 4.4, 


P(TUS UB) 
= P(T) + P(S) + P(B) — P(TS) — P(TB) — P(SB) + P(TSB) 
_ 36 + 28+ 18 —- 22-12 -9 44 


N 
_ 43 
~ N 
Hence, we can conclude that 43 members play at least one of the sports. a 


The next example in this section not only possesses the virtue of giving rise to a 
somewhat surprising answer, but is also of theoretical interest. 


The matching problem 


Suppose that each of N men at a party throws his hat into the center of the room. 
The hats are first mixed up, and then each man randomly selects a hat. What is the 
probability that none of the men selects his own hat? 


Solution We first calculate the complementary probability of at least one man select- 
ing his own hat. Let us denote by &j,i = 1,2,...,N the event that the ith man selects 


N 
his own hat. Now, by the inclusion-exclusion identity P (i Bi) the probability that 
i=1 


at least one of the men selects his own hat, is given by 


N N 
PIU] =>) Pt) - Do PG, R,) + -- 
i=1 i=1 i <i 
+(-D" SO PE Ei, Ei,) 
1 <i2++*<in 


4 6a te (a1) PR Be ses Bg) 


If we regard the outcome of this experiment as a vector of N numbers, where the ith 
element is the number of the hat drawn by the ith man, then there are N! possible 
outcomes. [The outcome (1,2,3,..., NM) means, for example, that each man selects his 
own hat.] Furthermore, £;, E;,...£;,, the event that each of the m men ij, i2,...,in 
selects his own hat, can occur in any of (NV — n)(N — n — 1)---3-2-1=(N—n)! 
possible ways; for, of the remaining N — n men, the first can select any of N — n 
hats, the second can then select any of N — n — 1 hats, and so on. Hence, assuming 
that all NV! possible outcomes are equally likely, we see that 


(N — n)! 


PCEi, Ei Ei.) = 


Also, as there are ( > termsin >> P(E;,E;j,--- £;,), it follows that 


1 <I2++:<iy 


N! (N—n)! 1 


~(N—n)int N! n! 


Y> P(E Ei, +++ Ei,) 


1 <i2+++<Iy 
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Thus, 


1 "es 
PD oT a a ol 


0° ‘ 
Upon letting x = —1 in the identity eX = )> x'/i!, the preceding probability when N 
i=0 


is large is seen to be approximately equal to e~! ~ .3679. In other words, for N large, 
the probability that none of the men selects his own hat is approximately .37 (How 
many readers would have incorrectly thought that this probability would go to 1 as 
N->0o?) O 


For another illustration of the usefulness of the inclusion-exclusion identity, con- 
sider the following example. 


Compute the probability that if 10 married couples are seated at random at a round 
table, then no wife sits next to her husband. 


Solution If we let Ej,i = 1,2,...,10 denote the event that the ith couple sit next to 
10 

each other, it follows that the desired probability is 1 — P| J E;}. Now, from the 
i=l 

inclusion-exclusion identity, 


10 
PIUE)=>)0P@)---+ Cn SO PGE: Ey) 
1 | <1 <-++<iy 


aes ee = ER Bags Bg) 


To compute P(E}, Fi, --- E;j,,), we first note that there are 19! ways of arranging 
20 people around a round table. (Why?) The number of arrangements that result in 
a specified set of n men sitting next to their wives can most easily be obtained by first 
thinking of each of the n married couples as being single entities. If this were the 
case, then we would need to arrange 20 — n entities around a round table, and there 
are clearly (20 — n — 1)! such arrangements. Finally, since each of the n married 
couples can be arranged next to each other in one of two possible ways, it follows 
that there are 2”(20 — n — 1)! arrangements that result in a specified set of n men 
each sitting next to their wives. Therefore, 


2”(19 — n)! 
P(E, Ba++ Ei) = 


Thus, from Proposition 4.4, we obtain that the probability that at least one married 
couple sits together is 


10 \,, 8)! 10 \ ,, (17)! 10 \ ,3 (16)! 10 Lie St 


and the desired probability is approximately .3395. Oo 
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Runs 


Consider an athletic team that had just finished its season with a final record of n 
wins and m losses. By examining the sequence of wins and losses, we are hoping to 
determine whether the team had stretches of games in which it was more likely to 
win than at other times. One way to gain some insight into this question is to count 
the number of runs of wins and then see how likely that result would be when all 
(n + m)!/(! m!) orderings of the 1 wins and m losses are assumed equally likely. By 
a run of wins, we mean a consecutive sequence of wins. For instance, ifn = 10,m = 6, 
and the sequence of outcomes was WWLLWWWLWLLLWWWW, then there would 
be 4 runs of wins—the first run being of size 2, the second of size 3, the third of size 
1, and the fourth of size 4. 

Suppose now that a team has n wins and m losses. Assuming that all (1 + m)!/ 


n+m 


(n! m!) = orderings are equally likely, let us determine the probability 


that there will be exactly r runs of wins. To do so, consider first any vector of positive 
integers X1,X2,...,x- with x, + --- + x, =n, and let us see how many outcomes 
result in r runs of wins in which the ith run is of size x;,i = 1,...,7. For any such 
outcome, if we let y; denote the number of losses before the first run of wins, yz the 
number of losses between the first 2 runs of wins, ..., y-1 the number of losses after 
the last run of wins, then the y; satisfy 


Yat v2 tees +1 =m yy. 2 Ov = Ovi > 0,0 =2,...,7 
and the outcome can be represented schematically as 


LL...L, WW...WL...L WW...W--- WW L...L 
ee eee 


y1 x4 y2 x2 Xr Yr+l1 
Hence, the number of outcomes that result in r runs of wins—the ith of size x;,i = 


1,...r—is equal to the number of integers yj,...,y,+1 that satisfy the foregoing, or, 
equivalently, to the number of positive integers 


y=yN +1 ¥;, = Vint = 2,...50, Vay =Yr1 + 1 
that satisfy 
Vp + Yo ++ + Vy =m t+ 2 


By Proposition 6.1 in Chapter 1, there are ( u : such outcomes. Hence, the 


: _ . (m+ er 
total number of outcomes that result in r runs of wins is multiplied by the 
number of positive integral solutions of xj + --- + x, =n. Thus, again from Propo- 


de 1 -1 ae . 
sition 6.1, there are ( ” - ( " 4 outcomes resulting in r runs of wins. As 


there are { ” 7 e equally likely outcomes, it follows that 
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P({rruns of wins}) = 


For example, if n = 8 and m = 6, then the probability of 7 runs is (; ( : 7 


. = 1/429 if all 7 outcomes are equally likely. Hence, if the outcome 
was WLWLWLWLWWLWLW, then we might suspect that the team’s probability of 
winning was changing over time. (In particular, the probability that the team wins 
seems to be quite high when it lost its last game and quite low when it won its last 
game.) On the other extreme, if the outcome were WWWWWWWWLLLLLL, then 


there would have been only 1 run, and as P({1 run}) = (7) (3)/ ( *) = 


1/429, it would thus again seem unlikely that the team’s probability of winning 
remained unchanged over its 14 games. | 


“2.6 Probability as a Continuous Set Function 


A sequence of events {E,,,n = 1} is said to be an increasing sequence if 


Ey, C Ey C++: C En C Enyy C-:: 


whereas it is said to be a decreasing sequence if 


iy Dk, D:D ED Eq D-:- 


If {En,n = 1} is an increasing sequence of events, then we define a new event, 
denoted by lim Ep, by 
n—> co 


CO 

pl Oe 
i=1 

Similarly, if {E,,n = 1} is a decreasing sequence of events, we define lim En, by 
n (oe) 

CO 

ee 
i= 


We now prove the following Proposition 6.1: 


Proposition If {£,, = 1} is either an increasing or a decreasing sequence of events, then 


6.1 


we fea = se En) 
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Proof Suppose, first, that {E,,n = 1} is an increasing sequence, and define the events 


Fy,n = 1, by 
Fi = Ey 
n—-1 . 
F, = En U E;) =EnES_, n> 1 
1 
n—-1 
where we have used the fact that _) E; = E,_1, since the events are increasing. In 


1 
words, F;, consists of those outcomes in £,, that are not in any of the earlier Ej,i < n. 
It is easy to verify that the F;, are mutually exclusive events such that 


[oe) [o-e) n n 
UR=Ue and UJAR=UsE foralln =1 


i=1 i=1 i=1 i=1 


Thus, 


[o@) [oe) 
P zi =P UF 
1 1 


> P(Fi) (by Axiom 3) 
1 


n 
se 


n 
eg (Oa 


n 
peng WO 
= lim PE) 
which proves the result when {E,,n = 1} is increasing. 


If {En,n = 1} is a decreasing sequence, then {E*,n = 1} is an increasing sequence; 
hence, from the preceding equations, 


fee) 
cc} 4; C 
Pp U Ej ~ lee PE) 


ic 
CO (oe) 
However, because (J Ef = (A #) , it follows that 
1 1 


c 


oo 
PUL) | = Pee 
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or, equivalently, 


[o.e) 
i le = Jim (1 — P(En)]=1 — lim PCEn) 


or 


(oe) 
(Ae) =n, Pen 
1 


which proves the result. 


Probability and a “paradox” 


Suppose that we possess an infinitely large urn and an infinite collection of balls 
labeled ball number 1, number 2, number 3, and so on. Consider an experiment 
performed as follows: At 1 minute to 12 PM., balls numbered 1 through 10 are placed 
in the urn and ball number 10 is withdrawn. (Assume that the withdrawal takes no 
time.) At minute to 12 PM., balls numbered 11 through 20 are placed in the urn and 
ball number 20 is withdrawn. At i minute to 12 PM., balls numbered 21 through 30 
are placed in the urn and ball number 30 is withdrawn. At 5 minute to 12 PM., and 
so on. The question of interest is, How many balls are in the urn at 12 PM.? 

The answer to this question is clearly that there is an infinite number of 
balls in the urn at 12PM., since any ball whose number is not of the form 10n, 
n = 1, will have been placed in the urn and will not have been withdrawn before 
12 PM. Hence, the problem is solved when the experiment is performed as described. 

However, let us now change the experiment and suppose that at 1 minute to 
12 PM., balls numbered 1 through 10 are placed in the urn and ball number 1 is with- 
drawn; at : minute to 12 PM., balls numbered 11 through 20 are placed in the urn 
and ball number 2 is withdrawn; at i minute to 12 PM., balls numbered 21 through 
30 are placed in the urn and ball number 3 is withdrawn; at 5 minute to 12 PM., balls 
numbered 31 through 40 are placed in the urn and ball number 4 is withdrawn, and 
so on. For this new experiment, how many balls are in the urn at 12 PM.? 

Surprisingly enough, the answer now is that the urn is empty at 12 PM. For, con- 
sider any ball—say, ball number n. At some time prior to 12 PM. [in particular, at 


4)" : minutes to 12 PM], this ball would have been withdrawn from the urn. Hence, 
for each n, ball number 7 is not in the urn at 12 PM.; therefore, the urn must be empty 
at that time. 

Because for all n, the number of balls in the urn after the mth interchange is 
the same in both variations of the experiment, most people are surprised that the 
two scenarios produce such different results in the limit. It is important to recognize 
that the reason the results are different is not because there is an actual paradox, or 
mathematical contradiction, but rather because of the logic of the situation, and also 
that the surprise results because one’s initial intuition when dealing with infinity is 
not always correct. (This latter statement is not surprising, for when the theory of 
the infinite was first developed by the mathematician Georg Cantor in the second 
half of the nineteenth century, many of the other leading mathematicians of the day 
called it nonsensical and ridiculed Cantor for making such claims as that the set of 
all integers and the set of all even integers have the same number of elements.) 
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We see from the preceding discussion that the manner in which the balls are 
withdrawn makes a difference. For, in the first case, only balls numbered 10n,n = 1, 
are ever withdrawn, whereas in the second case all of the balls are eventually with- 
drawn. Let us now suppose that whenever a ball is to be withdrawn, that ball is 
randomly selected from among those present. That is, suppose that at 1 minute to 
12 PM. balls numbered 1 through 10 are placed in the urn and a ball is randomly 
selected and withdrawn, and so on. In this case, how many balls are in the urn at 
12 PM.? 


Solution We shall show that, with probability 1, the urn is empty at 12 PM. Let us 
first consider ball number 1. Define E,, to be the event that ball number 1 is still in 
the urn after the first n withdrawals have been made. Clearly, 


9.18 . 27---(9n) 


P(Ey) = 
(En) 10 - 19 - 28---(9n + 1) 


[To understand this equation, just note that if ball number 1 is still to be in the 
urn after the first n withdrawals, the first ball withdrawn can be any one of 9, the 
second any one of 18 (there are 19 balls in the urn at the time of the second with- 
drawal, one of which must be ball number 1), and so on. The denominator is similarly 
obtained.] 


[o@) 
Now, the event that ball number 1 is in the urn at 12 PM. is just the event () Ey. 
n=1 
Because the events Ey, = 1, are decreasing events, it follows from Proposition 6.1 
that 


P{ball number 1 is in the urn at 12 PM.} 


[oe) 
=P () Ep 
n=1 


= lim POE) 


= 9on 
=[] (5, + 7) 
n=1 


We now show that 


Since 


I (4 i) ~ ("a *) 


n=1 


this is equivalent to showing that 
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Now, for all m = 1, 


L(+ + gp) = F(a) 


nel n=1 
1 1 1 1 
a(t 5) (tgs) a) on) 
Beal odie in dn 
“9. ig” 27 om 


[oe) 
Hence, letting m—oo and using the fact that )° 1/i = oo yields 
i=1 


CO 
1 
jie eee ee 

I] ( : mn) a 

n=1 
Thus, letting F; denote the event that ball number / is in the urn at 12 PM., we have 
shown that P(F;) = 0. Similarly, we can show that P(F;) = 0 for all i. 

[o-@) 


(For instance, the same reasoning shows that P(F;) = [[[9n/Qn + 1)] for 


n= 
i = 11,12,...,20.) Therefore, the probability that the urn is not empty at 12PM., 


Co 
P| UF; J, satisfies 
1 


P Ur < PUR) =0 
1 1 


by Boole’s inequality. 
Thus, with probability 1, the urn will be empty at 12 PM. | 


2.7 Probability as a Measure of Belief 


Thus far we have interpreted the probability of an event of a given experiment as 
being a measure of how frequently the event will occur when the experiment is 
continually repeated. However, there are also other uses of the term probability. 
For instance, we have all heard such statements as “It is 90 percent probable that 
Shakespeare actually wrote Hamlet” or “The probability that Oswald acted alone in 
assassinating Kennedy is .8.” How are we to interpret these statements? 

The most simple and natural interpretation is that the probabilities referred to 
are measures of the individual’s degree of belief in the statements that he or she 
is making. In other words, the individual making the foregoing statements is quite 
certain that Oswald acted alone and is even more certain that Shakespeare wrote 
Hamlet. This interpretation of probability as being a measure of the degree of one’s 
belief is often referred to as the personal or subjective view of probability. 

It seems logical to suppose that a “measure of the degree of one’s belief” should 
satisfy all of the axioms of probability. For example, if we are 70 percent certain that 
Shakespeare wrote Julius Caesar and 10 percent certain that it was actually Mar- 
lowe, then it is logical to suppose that we are 80 percent certain that it was either 


Summary 
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Shakespeare or Marlowe. Hence, whether we interpret probability as a measure of 
belief or as a long-run frequency of occurrence, its mathematical properties remain 
unchanged. 


Suppose that in a 7-horse race, you believe that each of the first 2 horses has a 20 
percent chance of winning, horses 3 and 4 each have a 15 percent chance, and the 
remaining 3 horses have a 10 percent chance each. Would it be better for you to 
wager at even money that the winner will be one of the first three horses or to wager, 
again at even money, that the winner will be one of the horses 1, 5, 6, and 7? 


Solution On the basis of your personal probabilities concerning the outcome of the 
race, your probability of winning the first bet is 2 + .2 + .15 = .55, whereas 
itis .2 + 1+ .1 + .1 = .5 for the second bet. Hence, the first wager is more 
attractive. a 


Note that in supposing that a person’s subjective probabilities are always consis- 
tent with the axioms of probability, we are dealing with an idealized rather than an 
actual person. For instance, if we were to ask someone what he thought the chances 
were of 


(a) rain today, 

(b) rain tomorrow, 

(c) rain both today and tomorrow, 

(d) rain either today or tomorrow, 
it is quite possible that, after some deliberation, he might give 30 percent, 40 percent, 
20 percent, and 60 percent as answers. Unfortunately, such answers (or such subjec- 
tive probabilities) are not consistent with the axioms of probability. (Why not?) We 
would of course hope that after this was pointed out to the respondent, he would 


change his answers. (One possibility we could accept is 30 percent, 40 percent, 10 
percent, and 60 percent.) 


Let S denote the set of all possible outcomes of an exper- 
iment. S is called the sample space of the experiment. An 
event is a subset of S. If Aj,i = 1,...,n, are events, then 


n 

L) Aj, called the union of these events, consists of all out- 
i=1 
comes that are in at least one of the events A;,i=1,...,n. 


n 
Similarly, (| Aj, sometimes written as A,--- An, is called 


i=1 
the intersection of the events A; and consists of all out- 
comes that are in all of the events Aj,i=1,...,n. 

For any event A, we define A° to consist of all out- 
comes in the sample space that are not in A. We call A“ 
the complement of the event A. The event S°, which is 
empty of outcomes, is designated by @ and is called the 
null set. If AB = ©, then we say that A and B are mutually 
exclusive. 

For each event A of the sample space S, we suppose 
that a number P(A), called the probability of A, is defined 
and is such that 


(i) 0= P(A) S$ 1 
(ii) P(S) =1 
(iii) For mutually exclusive events Aj,i = 1, 


P| JAi] = >> PAD 
i=1 i=1 


P(A) represents the probability that the outcome of the 
experiment is in A. 
It can be shown that 


P(A‘) =1 — P(A) 
A useful result is that 


P(A U B)= P(A) + P(B) — P(AB) 


which can be generalized to give 
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P Ua = Dy - Ded PAA) 
[= i= i<j 


+ SO 2 S> PAiAjAR) 


i<j<k 
ove t (1) T P(A]: An) 


This result is known as the inclusion—exclusion identity. 


Problems 


1. Organizers of a three-day conference are considering 
food items for lunch. The two available options are either 
fish or meat. Set up a set as the sample space for all possi- 
bilities. How would the set change if the organizers insist 
that the same food item not be served on two consecutive 
days? 


2. In a session, an archer shoots rounds of 3 arrows. The 
session is terminated when the target is hit with all three 
arrows in one round. Set up a set to serve as the sample 
space. Let A, denote the event comprising sessions with 
durations longer than n. Determine the event () 7") An. 


3. Two football teams are playing. Let A be the event that 
the match ends in a draw, and let B be the event that the 
home team wins. Assuming that not more than 6 goals 
are scored in all, list the sample space and the following 
events: AM Band A U B. Let Cbe the event that the away 
team scores. List the elements of A U C°, AS N BN C. 
Show directly that A N (BS U C)=A. 


4. A, B, and C take turns flipping a coin. The first one to 
get a head wins. The sample space of this experiment can 
be defined by 


sa 1,01,001,0001,..., 
~ ) 0000--- 


(a) Interpret the sample space. 
(b) Define the following events in terms of S: 


(i) A wins = A. 
(ii) B wins = B. 
(ili) (A U BY. 
Assume that A flips first, then B, then C, then A, 
and so on. 


5. A system is composed of 5 components, each of which 
is either working or failed. Consider an experiment that 
consists of observing the status of each component, and 


If S is finite and each one point set is assumed to have 
equal probability, then 
|A| 
P(A) = — 
|S| 
where |£| denotes the number of outcomes in the event E. 
P(A) can be interpreted either as a long-run relative 
frequency or as a measure of one’s degree of belief. 


let the outcome of the experiment be given by the vec- 
tor (%1,%2,X3,%4,x5), where x; is equal to 1 if compo- 
nent i is working and is equal to 0 if component 7 is 
failed. 


(a) How many outcomes are in the sample space of this 
experiment? 

(b) Suppose that the system will work if components 1 and 
2 are both working, or if components 3 and 4 are both 
working, or if components 1, 3, and 5 are all working. Let 
W be the event that the system will work. Specify all the 
outcomes in W. 

(c) Let A be the event that components 4 and 5 are 
both failed. How many outcomes are contained in the 
event A? 


(d) Write out all the outcomes in the event AW. 


6. A hospital administrator codes incoming patients suf- 
fering gunshot wounds according to whether they have 
insurance (coding 1 if they do and 0 if they do not) and 
according to their condition, which is rated as good (g), fair 
(f), or serious (s). Consider an experiment that consists of 
the coding of such a patient. 


(a) Give the sample space of this experiment. 

(b) Let A be the event that the patient is in serious condi- 
tion. Specify the outcomes in A. 

(c) Let B be the event that the patient is uninsured. Specify 
the outcomes in B. 

(d) Give all the outcomes in the event B® U A. 


7. A bus departs from Bus Stop 16 with 20 passengers. The 
bus’s journey will end at Bus Stop 20. Passengers can leave 
the bus at any stop, but no passengers can board. Work out 
the number of possibilities in 


(a) the whole sample space. 


(b) the event that no passenger leaves at Bus Stops 16 
or 18. 


(c) the event that half of the passengers leave by Bus 
Stop 18. 


8. The union of events A and B make up the whole sample 
space Q. If P(A) = .6 and P(B) = .8, find the probabil- 
ity of 


(a) the event of all the possibilities that A shares with B. 
(b) the event of those possibilities that are exclusively A’s. 


(c) the events whose possibilities are either those of A’s or 
those of B’s that it does not share with A. 


9. In a school, three-quarters of students are involved in 
sports, half are involved in cultural activities, and one- 
eighth are involved in neither. Calculate the probability 
that a student is involved in 


(a) either sports or in cultural activities. 
(b) both sports and cultural activities. 
(c) cultural activities but not sports. 


10. A restaurant manager was listing the type of items not 
ordered by her clients, 80 percent of whom have meat. Ten 
percent do not have meat or dessert, and 25 percent do 
not have any dessert at all. What is the probability that a 
customer will have 


(a) either meat or dessert or both? 
(b) meat but not dessert? 
(c) meat and dessert? 


11. Of 120 persons applying for a job, 80 have work expe- 
rience, 60 have qualifications, and 40 experienced appli- 
cants have no qualifications. An auditor randomly selects 
an applicant. What is the probability that this applicant is 


(a) qualified and experienced? 
(b) neither qualified nor experienced? 


12. Trincas Tours offers three add-ons to their holiday 
packages: meals (M), sightseeing trips (S), and theater 
visits (T). Past records show that 31 percent of clients 
chose only meals, 18 percent only sightseeing, and 7 per- 
cent only theatre. Nine percent chose all three options. 
Eleven percent chose meals and sightseeing only, 8 per- 
cent chose sightseeing and theatre only, and 7 percent 
chose meals and theater only. What is the probability that 
a client 


(a) chooses exactly two options? 


(b) chooses one or more options excluding theatre? 
(c) refuses all options? 


13. A certain town with a population of 100,000 has 3 
newspapers: I, II, and II. The proportions of townspeople 
who read these papers are as follows: 


T and II and 
III: 1 percent 


I: 10 percent I and II: 8 percent 


II: 30 percent I and III: 2 percent 
II: 5 percent II and III: 4 percent 
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(The list tells us, for instance, that 8000 people read news- 
papers I and II.) 


(a) Find the number of people who read only one newspa- 
per. 

(b) How many people read at least two newspapers? 

(c) If I and III are morning papers and II is an evening 
paper, how many people read at least one morning paper 
plus an evening paper? 


(d) How many people do not read any newspapers? 


(e) How many people read only one morning paper and 
one evening paper? 


14. A bistro owner was taking stock of 1101 orders 
received on a particular day. Four hundred and eleven 
persons ordered only drinks, 231 only food, and 62 only 
sweets. Forty five had drinks and sweets but no food, 312 
had food and drinks but no sweets, and 11 had sweets 
and food but no drinks. Forty one customers had all three. 
However, the owner has a feeling that he has recorded one 
figure incorrectly. The other set of more precise data that 
he has is that the number of customers who chose drinks, 
food, and sweets were 797, 595, and 147 respectively. Trans- 
late the numbers above into probabilities and establish 
which number is incorrect. 


15. Each of five contestants competing in a quiz are asked 
10 quick-fire questions, and their answers may be correct 
or incorrect with equal probability. The score of each con- 
testant depends on the number of correct answers. What is 
the probability that 

(a) all contestants get a different score? 

(b) exactly 2 of them get the same score? 

(c) exactly 3 of them get the same score? 


16. Poker dice is played by simultaneously rolling 5 dice. 
Show that 

(a) P{no two alike} = .0926; 

(b) P{one pair} = .4630; 

(c) P{two pair} = .2315; 

(d) P{three alike} = .1543; 

(e) P{full house} = .0386; 

(f) P{four alike} = .0193; 

(g) P{five alike} = .0008. 


17. Twenty five people, consisting of 15 women and 10 men 
are lined up in a random order. Find the probability that 
the ninth woman to appear is in position 17. That is, find 
the probability there are 8 women in positions 1 thru 16 
and a woman in position 17. 


18. Each of 20 families selected to take part in a trea- 
sure hunt consist of a mother, father, son, and daughter. 
Assuming that they look for the treasure in pairs that are 
randomly chosen from the 80 participating individuals and 
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that each pair has the same probability of finding the trea- 
sure, calculate the probability that the pair that finds the 
treasure includes a mother but not her daughter. 


19. An urn contains 11 red, 4 blue, and 7 black balls, while 
another urn contains 5 red, 3 yellow, and 6 black balls. 
What is the probability that a blindfolded person who 
picks one ball from each urn chooses either two balls of 
the same color or two balls of colors that are exclusive to 
each urn? 


20. Suppose that you are playing blackjack against a 
dealer. In a freshly shuffled deck, what is the probability 
that neither you nor the dealer is dealt a blackjack? 


21. Chocolate bars can be bought from Nejku’s Confec- 
tionery as singles or in packs of either two, four, ten, or 
twenty. Nejku knows that last week he sold 23 singles, 45 
packs of two bars, 33 packs of four bars, 12 packs of ten 
bars, and 11 packs of twenty bars. Customers return wrap- 
pers to Nejku so that they win customer points. 


(a) What is the probability that a customer chosen at ran- 
dom would buy a single bar or a pack with 2, 4, 10, and 20 
bars, respectively? 

(b) If a returned chocolate wrapper is chosen at random, 
what is the probability that it belonged to a customer who 
bought a single bar or a pack with 2, 4, 10, and 20 bars, 
respectively? 


22. Each of 52 people are given a deck of cards, which they 
are asked to shuffle independent of each other. What is the 
probability that 


(a) the order of the cards in each shuffled deck is unique? 
(b) there is exactly one card that occupies the same posi- 
tion in the shuffled decks received from all 52 persons? 


(c) all cards occupy the same position in all the shuffled 
decks? 


23. Two numbers from 1 to 10 are selected randomly in 
succession. What is the probability that their product is less 
than or equal to 50? 


24. Urn 1 has 4 red and 3 white balls. Urn 2 has 4 red 
and 5 white balls. One ball is taken from each urn and 
exchanged. Which configuration is more probable after 2 
exchanges? 


25. In each round, an archer shoots two arrows at a tar- 
get that has an outer ring, an inner ring, and a bullseye. 
The archer will stop shooting once she hits bullseye twice, 
or the inner ring twice, or the bullseye and the inner ring 
once each. The probabilities of hitting the inner ring and 
the bullseye are .5 and .1 respectively. 


(a) What is the probability she will never stop shooting? 


(b) What is the probability she will stop shooting for hav- 
ing hit bullseye twice? 


26. The game of craps is played as follows: A player rolls 
two dice. If the sum of the dice is either a 2, 3, or 12, the 
player loses; if the sum is either a 7 or an 11, the player 
wins. If the outcome is anything else, the player continues 
to roll the dice until she rolls either the initial outcome or a 
7. If the 7 comes first, the player loses, whereas if the initial 
outcome reoccurs before the 7 appears, the player wins. 
Compute the probability of a player winning at craps. 

Hint: Let E; denote the event that the initial outcome is 


12 
i and the player wins. The desired probability is }> P(E;). 


i=2 
To compute P(F;), define the events E;,, to be the event 
that the initial sum is 7 and the player wins on the nth roll. 
[ee 
Argue that P(E;) = )° P(E;,,). 


n=1 


27. An urn contains 3 red and 7 black balls. Players A and 
B withdraw balls from the urn consecutively until a red 
ball is selected. Find the probability that A selects the red 
ball. (A draws the first ball, then B, and so on. There is no 
replacement of the balls drawn.) 


28. An urn contains 5 red, 6 blue, and 8 green balls. If a set 
of 3 balls is randomly selected, what is the probability that 
each of the balls will be (a) of the same color? (b) of differ- 
ent colors? Repeat under the assumption that whenever a 
ball is selected, its color is noted and it is then replaced in 
the urn before the next selection. This is known as sam- 
pling with replacement. 


29. An urn contains n white and m black balls, where n and 
m are positive numbers. 


(a) If two balls are randomly withdrawn, what is the prob- 
ability that they are the same color? 

(b) If a ball is randomly withdrawn and then replaced 
before the second one is drawn, what is the probability that 
the withdrawn balls are the same color? 

(c) Show that the probability in part (b) is always larger 
than the one in part (a). 


30. The chess clubs of two schools consist of, respectively, 
8 and 9 players. Four members from each club are ran- 
domly chosen to participate in a contest between the two 
schools. The chosen players from one team are then ran- 
domly paired with those from the other team, and each 
pairing plays a game of chess. Suppose that Rebecca and 
her sister Elise are on the chess clubs at different schools. 
What is the probability that 


(a) Rebecca and Elise will be paired? 

(b) Rebecca and Elise will be chosen to represent their 
schools but will not play each other? 

(c) either Rebecca or Elise will be chosen to represent her 
school? 


31. A 3-person basketball team consists of a guard, a for- 
ward, and a center. 


(a) If a person is chosen at random from each of three dif- 
ferent such teams, what is the probability of selecting a 
complete team? 

(b) What is the probability that all 3 players selected play 
the same position? 


32. A group of individuals containing b boys and g girls 
is lined up in random order; that is, each of the (b + g)! 
permutations is assumed to be equally likely. What is the 
probability that the person in the ith position, 1 =i<b+ g, 
is a girl? 


33. A forest contains 20 elk, of which 5 are captured, 
tagged, and then released. A certain time later, 4 of the 
20 elk are captured. What is the probability that 2 of 
these 4 have been tagged? What assumptions are you 
making? 


34. The second Earl of Yarborough is reported to have bet 
at odds of 1000 to 1 that a bridge hand of 13 cards would 
contain at least one card that is ten or higher. (By ten or 
higher we mean that a card is either a ten, a jack, a queen, 
a king, or an ace.) Nowadays, we call a hand that has no 
cards higher than 9 a Yarborough. What is the probability 
that a randomly selected bridge hand is a Yarborough? 


35. Seven balls are randomly withdrawn from an urn that 
contains 12 red, 16 blue, and 18 green balls. Find the prob- 
ability that 


(a) 3 red, 2 blue, and 2 green balls are withdrawn; 
(b) at least 2 red balls are withdrawn; 
(c) all withdrawn balls are the same color; 


(d) either exactly 3 red balls or exactly 3 blue balls are 
withdrawn. 


36. Two cards are chosen at random from a deck of 52 play- 
ing cards. What is the probability that they 


(a) are both aces? 
(b) have the same value? 


37. An instructor gives her class a set of 10 problems with 
the information that the final exam will consist of a ran- 
dom selection of 5 of them. If a student has figured out 
how to do 7 of the problems, what is the probability that 
he or she will answer correctly 


(a) all 5 problems? 
(b) at least 4 of the problems? 


38. There are n socks, 3 of which are red, in a drawer. What 
is the value of n if, when 2 of the socks are chosen ran- 
domly, the probability that they are both red is 5? 
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39. There are 5 hotels in a certain town. If 3 people check 
into hotels in a day, what is the probability that they each 
check into a different hotel? What assumptions are you 
making? 


40. If 4 balls are randomly chosen from an urn containing 
4 red, 5 white, 6 blue, and 7 green balls, find the probability 
that 


(a) at least one of the 4 balls chosen is green; 
(b) one ball of each color is chosen. 


41. If a die is rolled 4 times, what is the probability that 6 
comes up at least once? 


42. Two dice are thrown n times in succession. Compute 
the probability that double 6 appears at least once. How 
large need n be to make this probability at least 5? 


43. (a) If N people, including A and B, are randomly 
arranged in a line, what is the probability that A and B 
are next to each other? 

(b) What would the probability be if the people were ran- 
domly arranged in a circle? 


44. Five people, designated as A, B, C, D, E, are arranged 
in linear order. Assuming that each possible order is 
equally likely, what is the probability that 


(a) there is exactly one person between A and B? 
(b) there are exactly two people between A and B? 
(c) there are three people between A and B? 


45. A woman has n keys, of which one will open her door. 


(a) If she tries the keys at random, discarding those that 
do not work, what is the probability that she will open the 
door on her kth try? 


(b) What if she does not discard previously tried keys? 


46. How many people have to be in a room in order that 
the probability that at least two of them celebrate their 
birthday in the same month is at least 5? Assume that all 
possible monthly outcomes are equally likely. 


47. Suppose that 5 of the numbers 1,2,...,14 are chosen. 
Find the probability that 9 is the third smallest value cho- 
sen. 


48. Given 20 people, what is the probability that among 
the 12 months in the year, there are 4 months containing 
exactly 2 birthdays and 4 containing exactly 3 birthdays? 


49. A group of 6 men and 6 women is randomly divided 
into 2 groups of size 6 each. What is the probability that 
both groups will have the same number of men? 


50. In a hand of bridge, find the probability that you have 
5 spades and your partner has the remaining 8. 
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51. Suppose that n balls are randomly distributed into N 
compartments. Find the probability that m balls will fall 
into the first compartment. Assume that all N” arrange- 
ments are equally likely. 


52. A closet contains 10 pairs of shoes. If 8 shoes are ran- 
domly selected, what is the probability that there will be 


(a) no complete pair? 
(b) exactly 1 complete pair? 


53. If 8 people, consisting of 4 couples, are randomly 
arranged in a row, find the probability that no person is 
next to their partner. 


54. Compute the probability that a bridge hand is void in 
at least one suit. Note that the answer is not 


(33) 


(Why not?) 
Hint: Use Proposition 4.4. 


55. Compute the probability that a hand of 13 cards 
contains 


(a) the ace and king of at least one suit; 
(b) all 4 of at least 1 of the 13 denominations. 


56. Two players play the following game: Player A chooses 
one of the three spinners pictured in Figure 2.6, and then 
player B chooses one of the remaining two spinners. Both 
players then spin their spinner, and the one that lands on 
the higher number is declared the winner. Assuming that 
each spinner is equally likely to land in any of its 3 regions, 
would you rather be player A or player B? Explain your 
answer! 


Figure 2.6 Spinners. 


Theoretical Exercises 


Prove the following relations: 


LEFCECEUF. 
2.1IfE C F,then F° C E*. 


3.F=FE VU FESandEU F=EU EF. 
CO CO 

4. UE; F=WVEF and 
1 1 


(«) UF=NE U F). 
1 1 


5. For any sequence of events FE), F2,..., define a new 
sequence F', Fy,... of disjoint events (that is, events such 
that F;F; = © whenever i # j) such that for alln = 1, 


n n 
Ur=Us 
1 1 


6. Let E, F, and G be three events. Find expressions for 
the events so that, of E, F, and G, 


(a) only E occurs; 

(b) both E and G, but not F, occur; 
(c) at least one of the events occurs; 
(d) at least two of the events occur; 
(e) all three events occur; 

(f) none of the events occurs; 

(g) at most one of the events occurs; 
(h) at most two of the events occur; 
(i) exactly two of the events occur; 
(j) at most three of the events occur. 


7. Use Venn diagrams 


(a) to simplify the expression (E U F)(E U F*); 
(b) to prove DeMorganr’s laws for events E and F. [That is, 
prove (E U F)°=E*F®*, and (EF)° = Ef U F*.| 


8. Let S be a given set. If, for some k > 0, S1,S2,...,Sx 
are mutually exclusive nonempty subsets of S such that 


k 

LJ) Si = S, then we call the set {5 ,59,...,S,} a parti- 
i=1 

tion of S. Let T,, denote the number of different parti- 
tions of {1,2,...,”}. Thus, 7; = 1 (the only partition 
being S; = {1}) and T, = 2 (the two partitions being 
{{1, 2, }}, (13, {2}}). 


(a) Show, by computing all partitions, that 73 = 5, 74= 15. 


A First Course in Probability 67 


(b) Show that 


n 
Trt =1 + (h)m 
k=1 


and use this equation to compute Tj. 

Hint: One way of choosing a partition of n + 1 items is to 
call one of the items special. Then we obtain different par- 
titions by first choosing k,k = 0,1,...,n, then a subset of 
size n — k of the nonspecial items, and then any of the 7; 
partitions of the remaining k nonspecial items. By adding 
the special item to the subset of size n — k, we obtain a 
partition of alln + 1 items. 


9. Suppose that an experiment is performed n times. For 
any event EF of the sample space, let n(E) denote the num- 
ber of times that event E occurs and define f(E) = n(E)/n. 
Show that f(-) satisfies Axioms 1, 2, and 3. 


10. Prove that PLE U F U G) = P(E) + P(F) + P(G) - 
P(ESFG) — P(EF°G) — P(EFG‘) — 2P(EFG). 


11. If P(E) = .9 and P(F) = .8, show that P(EF) = .7. In 
general, prove Bonferroni’s inequality, namely, 


P(EF) = P(E) + P(F) — 1 


12. Show that the probability that exactly one of the events 
E or F occurs equals P(E) + P(F) — 2P(EF). 


13. Prove that P(EF*) = P(E) — P(EF). 
14. Prove Proposition 4.4 by mathematical induction. 


15. An urn contains M white and N black balls. If a ran- 
dom sample of size r is chosen, what is the probability that 
it contains exactly k white balls? 


16. Use induction to generalize Bonferroni's inequality to 
n events. That is, show that 


P(E, E)--- En) = PU) +--+ + P(En) — @ - YD 


17. Consider the matching problem, Example 5m, and 
define Ay to be the number of ways in which the N 
men can select their hats so that no man selects his own. 
Argue that 
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Ay =(N — 1)(Ay-1 + An-2) 


This formula, along with the boundary conditions A; = 0, 
Az = 1, can then be solved for Ay, and the desired proba- 
bility of no matches would be Ay /N!. 

Hint: After the first man selects a hat that is not his own, 
there remain N — 1 men to select among a set of N — 1 
hats that does not contain the hat of one of these men. 
Thus, there is one extra man and one extra hat. Argue that 
we can get no matches either with the extra man select- 
ing the extra hat or with the extra man not selecting the 
extra hat. 


18. Let f,, denote the number of ways of tossing a coin n 
times such that successive heads never appear. Argue that 


fu =fn—-1 + fn-2 


Hint: How many outcomes are there that start with a head, 
and how many start with a tail? If P,, denotes the proba- 
bility that successive heads never appear when a coin is 
tossed n times, find P, (in terms of f,,) when all possible 
outcomes of the 7 tosses are assumed equally likely. Com- 
pute Pio. 


n = 2,where fp = 1, f, = 2 


19. An urn contains n red and m blue balls. They are with- 
drawn one at a time until a total of r,r = n, red balls have 


Self-Test Problems and Exercises 


1. A cafeteria offers a three-course meal consisting of an 
entree, a starch, and a dessert. The possible choices are 
given in the following table: 


Course Choices 

Entree Chicken or roast beef 

Starch __ Pasta or rice or potatoes 

Dessert Ice cream or Jello or apple pie or a peach 


A person is to choose one course from each category. 


(a) How many outcomes are in the sample space? 

(b) Let A be the event that ice cream is chosen. How many 
outcomes are in A? 

(c) Let B be the event that chicken is chosen. How many 
outcomes are in B? 

(d) List all the outcomes in the event AB. 

(e) Let C be the event that rice is chosen. How many out- 
comes are in C? 

(f) List all the outcomes in the event ABC. 


2. A customer visiting the suit department of a certain 
store will purchase a suit with probability .22, a shirt with 


been withdrawn. Find the probability that a total of k balls 
are withdrawn. 

Hint: A total of k balls will be withdrawn if there are r — 1 
red balls in the first kK — 1 withdrawals and the kth with- 
drawal is a red ball. 


20. Consider an experiment whose sample space consists 
of a countably infinite number of points. Show that not all 
points can be equally likely. Can all points have a positive 
probability of occurring? 


“21. Consider Example 50, which is concerned with the 
number of runs of wins obtained when n wins and m losses 
are randomly permuted. Now consider the total number of 
runs—that is, win runs plus loss runs—and show that 

m—1 


n—1 
k-1 k-1 

m+n 

nh 

P{2k + 1 runs} 

m—1 n—1 m—1 
= e ye k 

~ m+n 

iA; 


P{2k runs} = 2 ( 


ae 


probability .30, and a tie with probability .28. The customer 
will purchase both a suit and a shirt with probability .11, 
both a suit and a tie with probability .14, and both a shirt 
and a tie with probability .10. A customer will purchase all 
3 items with probability .06. What is the probability that a 
customer purchases 


(a) none of these items? 
(b) exactly 1 of these items? 


3. A deck of cards is dealt out. What is the probability that 
the 14th card dealt is an ace? What is the probability that 
the first ace occurs on the 14th card? 


4. Let A denote the event that the midtown temperature 
in Los Angeles is 70°F, and let B denote the event that 
the midtown temperature in New York is 70°F. Also, let 
C denote the event that the maximum of the midtown 
temperatures in New York and in Los Angeles is 70°F. If 
P(A) = .3,P(B) = 4, and P(C) = .2, find the probabil- 
ity that the minimum of the two midtown temperatures is 
70°F. 


5. An ordinary deck of 52 cards is shuffled. What is the 
probability that the top four cards have 


(a) different denominations? 
(b) different suits? 


6. Urn A contains 3 red and 3 black balls, whereas urn 
B contains 4 red and 6 black balls. If a ball is randomly 
selected from each urn, what is the probability that the 
balls will be the same color? 


7. In a state lottery, a player must choose 8 of the num- 
bers from 1 to 40. The lottery commission then performs 
an experiment that selects 8 of these 40 numbers. Assum- 
ing that the choice of the lottery commission is equally 
likely to be any of the ( ° ) combinations, what is the 
probability that a player has 


(a) all 8 of the numbers selected by the lottery 
commission? 


(b) 7 of the numbers selected by the lottery commission? 


(c) at least 6 of the numbers selected by the lottery 
commission? 


8. From a group of 3 first-year students, 4 sophomores, 4 
juniors, and 3 seniors, a committee of size 4 is randomly 
selected. Find the probability that the committee will con- 
sist of 


(a) 1 from each class; 
(b) 2 sophomores and 2 juniors; 
(c) only sophomores or juniors. 


9. For a finite set A, let N(A) denote the number of ele- 
ments in A. 


(a) Show that 
N(A U B)= NA) + N(B) — NAB) 


(b) More generally, show that 


N Ua) =U MAD - 
i=1 i 


+e + (-1)"*1N(Ay--- An) 


y Dwaiay 


i<j 


10. Consider an experiment that consists of 6 horses, num- 
bered 1 through 6, running a race, and suppose that the 
sample space consists of the 6! possible orders in which the 
horses finish. Let A be the event that the number-1 horse 
is among the top three finishers, and let B be the event that 
the number-2 horse comes in second. How many outcomes 
are in the event A U B? 


11. A 5-card hand is dealt from a well-shuffled deck of 52 
playing cards. What is the probability that the hand con- 
tains at least one card from each of the four suits? 


12. A basketball team consists of 6 frontcourt and 4 
backcourt players. If players are divided into roommates 
at random, what is the probability that there will be exactly 
two roommate pairs made up of a backcourt and a front- 
court player? 
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13. Suppose that a person chooses a letter at random from 
RESERVE and then chooses one at random from 
VERTICAL. What is the probability that the same 
letter is chosen? 


14. Prove Boole’s inequality: 


is ( s PUA) 
i=1 i=1 


15. Show that if P(A;) = 1 for alli=1, then P (A a) =1. 
1 


16. Let 7;,(m) denote the number of partitions of the set 
{1,...,} into k nonempty subsets, where 1 = k = n. (See 
Theoretical Exercise 8 for the definition of a partition.) 
Argue that 


Ty (n) =kT,(n — 1) + Te-1(m — 1) 


Hint: In how many partitions is {1} a subset, and in how 
many is 1 an element of a subset that contains other 
elements? 


17. Five balls are randomly chosen, without replacement, 
from an urn that contains 5 red, 6 white, and 7 blue balls. 
Find the probability that at least one ball of each color is 
chosen. 


18. Four red, 8 blue, and 5 green balls are randomly 
arranged in a line. 


(a) What is the probability that the first 5 balls are blue? 
(b) What is the probability that none of the first 5 balls is 
blue? 

(c) What is the probability that the final 3 balls are of dif- 
ferent colors? 

(d) What is the probability that all the red balls are 
together? 


19. Ten cards are randomly chosen from a deck of 52 cards 
that consists of 13 cards of each of 4 different suits. Each 
of the selected cards is put in one of 4 piles, depending on 
the suit of the card. 


(a) What is the probability that the largest pile has 4 cards, 
the next largest has 3, the next largest has 2, and the small- 
est has 1 card? 

(b) What is the probability that two of the piles have 3 
cards, one has 4 cards, and one has no cards? 


20. Balls are randomly removed from an urn initially con- 
taining 20 red and 10 blue balls. What is the probability 
that all of the red balls are removed before all of the blue 
ones have been removed? 
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Introduction 


In this chapter, we introduce one of the most important concepts in probability 
theory, that of conditional probability. The importance of this concept is twofold. In 
the first place, we are often interested in calculating probabilities when some partial 
information concerning the result of an experiment is available; in such a situation, 
the desired probabilities are conditional. Second, even when no partial information 
is available, conditional probabilities can often be used to compute the desired prob- 
abilities more easily. 


3.2 Conditional Probabilities 


Suppose that we toss 2 dice, and suppose that each of the 36 possible outcomes is 
equally likely to occur and hence has probability 6: Suppose further that we observe 
that the first die is a 3. Then, given this information, what is the probability that the 
sum of the 2 dice equals 8? To calculate this probability, we reason as follows: Given 
that the initial die is a 3, there can be at most 6 possible outcomes of our experiment, 
namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6). Since each of these outcomes 
originally had the same probability of occurring, the outcomes should still have equal 
probabilities. That is, given that the first die is a 3, the (conditional) probability of 
each of the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6) is Ey whereas the 
(conditional) probability of the other 30 points in the sample space is 0. Hence, the 
desired probability will be é 

If we let E and F denote, respectively, the event that the sum of the dice is 8 
and the event that the first die is a 3, then the probability just obtained is called the 
conditional probability that E occurs given that F has occurred and is denoted by 


P(E|F) 


70 


Example 
2a 


Example 
2b 
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A general formula for P(E|F) that is valid for all events FE and F is derived in the 
same manner: If the event F occurs, then, in order for E to occur, it is necessary 
that the actual occurrence be a point both in E and in F; that is, it must be in EF. 
Now, since we know that F has occurred, it follows that F becomes our new, or 
reduced, sample space; hence, the probability that the event EF occurs will equal 
the probability of EF relative to the probability of F. That is, we have the following 
definition. 


Definition 
If P(F) > 0, then 
P(EF) 


(2.1) 


Joe is 80 percent certain that his missing key is in one of the two pockets of his 
hanging jacket, being 40 percent certain it is in the left-hand pocket and 40 percent 
certain it is in the right-hand pocket. If a search of the left-hand pocket does not find 
the key, what is the conditional probability that it is in the other pocket? 


Solution If we let L be the event that the key is in the left-hand pocket of the jacket, 
and R be the event that it is in the right-hand pocket, then the desired probability 
P(R\|L‘) can be obtained as follows: 


P(RL‘) 
P(L‘) 
P(R) 
~T— P(L) 
= 2/3 = 


P(R|L‘) = 


If each outcome of a finite sample space S is equally likely, then, conditional on 
the event that the outcome lies in a subset F C S, all outcomes in F become equally 
likely. In such cases, it is often convenient to compute conditional probabilities of 
the form P(E|F) by using F as the sample space. Indeed, working with this reduced 
sample space often results in an easier and better understood solution. Our next two 
examples illustrate this point. 


A coin is flipped twice. Assuming that all four points in the sample space $ = 
{(h, h), (h, 0), (t, 4), (t,0} are equally likely, what is the conditional probability that 
both flips land on heads, given that (a) the first flip lands on heads? (b) at least one 
flip lands on heads? 


Solution Let B = {(,h)} be the event that both flips land on heads; let F = {(h,/), 
(h, t)} be the event that the first flip lands on heads; and let A = {(h,h), (h, 0), (t,h)} be 
the event that at least one flip lands on heads. The probability for (a) can be obtained 
from 
P(BF) 

P(F) 
PU (h,h)}) 
~ P({(A,h), (h,)}) 

1/4 


P(BI|F) = 
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For (b), we have 


P(BA) 
P(A) 

ee CD) 

~ P(f(h,h), (0, (GA) 


ee ae 
=3 a= 


P(BIA) = 


Thus, the conditional probability that both flips land on heads given that the first 
one does is 1/2, whereas the conditional probability that both flips land on heads 
given that at least one does is only 1/3. Many students initially find this latter result 
surprising. They reason that given that at least one flip lands on heads, there are two 
possible results: Either they both land on heads or only one does. Their mistake, 
however, is in assuming that these two possibilities are equally likely. Initially there 
are 4 equally likely outcomes. Because the information that at least one flip lands on 
heads is equivalent to the information that the outcome is not (¢,4), we are left with 
the 3 equally likely outcomes (h, h), (h, 1), (¢, 2), only one of which results in both flips 
landing on heads. a 


In the card game bridge, the 52 cards are dealt out equally to 4 players—called East, 
West, North, and South. If North and South have a total of 8 spades among them, 
what is the probability that East has 3 of the remaining 5 spades? 


Solution Probably the easiest way to compute the desired probability is to work 
with the reduced sample space. That is, given that North-South have a total of 8 
spades among their 26 cards, there remains a total of 26 cards, exactly 5 of them 
being spades, to be distributed among the East-West hands. Since each distribution 
is equally likely, it follows that the conditional probability that East will have exactly 
3 spades among his or her 13 cards is 


Multiplying both sides of Equation (2.1) by P(F’), we obtain 
P(EF) = P(F)P(E|F) (2.2) 


In words, Equation (2.2) states that the probability that both E and F occur is equal 
to the probability that F occurs multiplied by the conditional probability of E given 
that F occurred. Equation (2.2) is often quite useful in computing the probability of 
the intersection of events. 


Celine is undecided as to whether to take a French course or a chemistry course. She 
estimates that her probability of receiving an A grade would be 5 in a French course 
and 4 in a chemistry course. If Celine decides to base her decision on the flip of a 
fair coin, what is the probability that she gets an A in chemistry? 


Solution Let C be the event that Celine takes chemistry and A denote the event 
that she receives an A in whatever course she takes, then the desired probability is 
P(CA), which is calculated by using Equation (2.2) as follows: 
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P(CA) = P(C)P(A|C) 


Suppose that an urn contains 8 red balls and 4 white balls. We draw 2 balls from the 
urn without replacement. (a) If we assume that at each draw, each ball in the urn is 
equally likely to be chosen, what is the probability that both balls drawn are red? (b) 
Now suppose that the balls have different weights, with each red ball having weight 
rand each white ball having weight w. Suppose that the probability that a given ball 
in the urn is the next one selected is its weight divided by the sum of the weights of 
all balls currently in the urn. Now what is the probability that both balls are red? 


Solution Let R; and R2 denote, respectively, the events that the first and second 
balls drawn are red. Now, given that the first ball selected is red, there are 7 remaining 
red balls and 4 white balls, so P(R2|R1) = i As P(R}) is clearly <, the desired 
probability is 


P(R1R2) = P(Ri)P(R2|R1) 
PN FT) _ 4 
“G)(i)-s 


Of course, this probability could have been computed by P(R1R2) = (5)/(‘5). 
For part (b), we again let R; be the event that the ith ball chosen is red and use 


P(R,R2) = P(R1)P(R2|R1) 


Now, number the red balls, and let B;, i = 1,..., 8 be the event that the first ball 
drawn is red ball number 7. Then 


r 


8 
= 8 Nc ao 
P(R1) = PUB, = ) | PB) = 8 


i=1 
Moreover, given that the first ball is red, the urn then contains 7 red and 4 white 
balls. Thus, by an argument similar to the preceding one, 


Tr 
P(R2|R1) = i oe 


Hence, the probability that both balls are red is 


8r Tr 
P(R,{ Ro) = 8B 
(Ri Ro) 8r + 4w 7r + 4w 


A generalization of Equation (2.2), which provides an expression for the proba- 
bility of the intersection of an arbitrary number of events, is sometimes referred to 
as the multiplication rule. 


The multiplication rule 


P(E, FE E3--- Ey) = PCE) P(E2| £1) P(E3|£) £2) +++ PUEn| Ey - ++ En-1) 
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In words, the multiplication rule states that P(E, £2 ---E,), the probability that 
all of the events Fj, E2,...,E, occur, is equal to P(£;), the probability that F; 
occurs, multiplied by P(£2|F)), the conditional probability that Ey occurs given 
that EF; has occurred, multiplied by P(£3|F) E2), the conditional probability that F3 
occurs given that both £; and FE have occurred, and so on. 

To prove the multiplication rule, just apply the definition of conditional proba- 
bility to its right-hand side, giving 


P(E, E2) P(E\E2E3) - P(E| E>--- En) 
P(E\)  P(E\ E>) PU bg 4s 


P(E)) = P(E, E2--- En) 


In the match problem stated in Example 5m of Chapter 2, it was shown that Py, the 
probability that there are no matches when N people randomly select from among 
their own N hats, is given by 


N 
Py =) 0(-1)'/i! 
i=0 
What is the probability that exactly k of the N people have matches? 


Solution Let us fix our attention on a particular set of k people and determine the 
probability that these k individuals have matches and no one else does. Letting E 
denote the event that everyone in this set has a match, and letting G be the event 
that none of the other N — k people have a match, we have 


P(EG) = P(E)P(G|E) 


Now, let Fj, i = 1,...,k, be the event that the ith member of the set has a match. 
Then 


P(E) = P(F\F)--- Fx) 
= PF) )P(Po|F 1) P(P3| PF) +> PUPP 1 + + Fe—-1) 


a i 1 
“~NN-1N—-2 N—-kKk+1 
_(N- Bb! 

~ NI 


Given that everyone in the set of k has a match, the other N — k people will be 
randomly choosing among their own N — k hats, so the probability that none of 
them has a match is equal to the probability of no matches in a problem having 
N — k people choosing among their own N — k hats. Therefore, 


N-k 
P(G|E) = Py_x = )\(-1)'/i! 
i=0 


showing that the probability that a specified set of k people have matches and no 
one else does is 
(N — k)! 
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Because there will be exactly k matches if the preceding is true for any of the (7) 
sets of k individuals, the desired probability is 


P(exactly k matches) = (Pee 


= Py_x/K! 
we! /k! when N is large | 


An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards 
each. Compute the probability that each pile has exactly 1 ace. 


Solution Define events £;,i = 1,2,3,4, as follows: 


E, = {the ace of spades is in any one of the piles} 
E2 = {the ace of spades and the ace of hearts are in different piles} 
E3 = {the aces of spades, hearts, and diamonds are all in different piles} 


E4 = {all 4 aces are in different piles} 
The desired probability is P(E, E2 £3E4), and by the multiplication rule, 
P(E) Fy F3F4) = PCE})P(E2|F1)P(E3| £1 £2) P(E4|F1 E2E3) 


Now, 
P(E\) =1 


since Fj is the sample space S. To determine P(E2|F}), consider the pile that con- 
tains the ace of spades. Because its remaining 12 cards are equally likely to be any 
12 of the remaining 51 cards, the probability that the ace of hearts is among them is 
12/51, giving that 
12 39 
P(E2|E,;) =1 - - == 
(E2|£1) 3 3] 
Also, given that the ace of spades and ace of hearts are in different piles, it follows 
that the set of the remaining 24 cards of these two piles is equally likely to be any set 
of 24 of the remaining 50 cards. As the probability that the ace of diamonds is one of 
these 24 is 24/50, we see that 


24 26 
P(£3\|E, £2) =1 —- == = 
(£3|£) £2) 0 50 
Because the same logic as used in the preceding yields that 
36 13 
P(E4|E, E2E3) = 1 — = = — 
(E4|E) £2 £3) 49 = 49 


the probability that each pile has exactly 1 ace is 


39 . 26 - 13 

P(E, £2 E3E4) = —————_ 
EO Gi 20.0 

That is, there is approximately a 10.5 percent chance that each pile will contain an 

ace. (Problem 13 gives another way of using the multiplication rule to solve this 

problem.) O 


105 


Four of the eight teams in the quarterfinal round of the 2016 European Cham- 
pions League Football (soccer) tournament were the acknowledged strong teams 
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Barcelona, Bayern Munich, Real Madrid, and Paris St-Germain. The pairings in this 
round are supposed to be totally random, in the sense that all possible pairings are 
equally likely. Assuming this is so, find the probability that none of the strong teams 
play each other in this round. (Surprisingly, it seems to be a common occurrence 
in this tournament that, even though the pairings are supposedly random, the very 
strong teams are rarely matched against each other in this round.) 


Solution If we number the four strong teams 1 through 4, and then let Wj,i = 
1,2,3,4, be the event that team i plays one of the four weak teams, then the desired 
probability is P(W, W2W3W4). By the multiplication rule 


P(W,W2W3Wa) = P(W1)P(W2|W1)P(W3| W1 W2)P(W4| W1 W2W3) 
= (4/7)3/5)(2/3)0) 
= 8/35 


The preceding follows by first noting that because team 1 is equally likely to be 
matched with any of the other 7 teams, we have that P(W ) = 4/7. Now, given that 
W, occurs, team 2 is equally likely to be matched with any of five teams: namely, 
teams 3, 4, or any of the three weak teams not matched with team 1. As three of 
these five teams are weak, we see that P(W2|W 1) = 3/5. Similarly, given that events 
W, and W2 have occurred, team 3 is equally likely to be matched with any from a 
set of three teams, consisting of team 4 and the remaining two weaker teams not 
matched with 1 or 2. Hence, P(W3|W, W2) = 2/3. Finally, given that W1, W2, and W3 
all occur, team 4 will be matched with the remaining weak team not matched with 
any of 1,2,3, giving that P(W4|W1,W2W3) = 1. | 


Remarks Our definition of P(E|F) is consistent with the interpretation of probability 
as being a long-run relative frequency. To see this, suppose that n repetitions of the 
experiment are to be performed, where nv is large. We claim that if we consider only 
those experiments in which F occurs, then P(E|F) will equal the long-run propor- 
tion of them in which E also occurs. To verify this statement, note that since P(F) 
is the long-run proportion of experiments in which F occurs, it follows that in the 
repetitions of the experiment, F will occur approximately nP(F) times. Similarly, in 
approximately nP(EF) of these experiments, both FE and F will occur. Hence, out of 
the approximately nP(F) experiments in which F occurs, the proportion of them in 
which E also occurs is approximately equal to 


nP(EF) _ P(EF) 
nP(F)  ~P(F) 


Because this approximation becomes exact as n becomes larger and larger, we have 
the appropriate definition of P(E|F). 


3.3 Bayes’s Formula 
Let E and F be events. We may express E as 
E=EF U EF* 


for, in order for an outcome to be in £, it must either be in both E and F or be in 
E but not in F. (See Figure 3.1.) As EF and EF® are clearly mutually exclusive, we 
have, by Axiom 3, 
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P(E) = P(EF) + P(EF*) 
= P(E|F)P(F) + P(E|F)P(F*) (3.1) 
= P(E|F)P(F) + P(E|F®)[1 — PCP) 


Equation (3.1) states that the probability of the event E is a weighted average of the 
conditional probability of E given that F has occurred and the conditional proba- 
bility of EF given that F has not occurred—each conditional probability being given 
as much weight as the event on which it is conditioned has of occurring. This is an 
extremely useful formula, because its use often enables us to determine the prob- 
ability of an event by first “conditioning” upon whether or not some second event 
has occurred. That is, there are many instances in which it is difficult to compute the 
probability of an event directly, but it is straightforward to compute it once we know 
whether or not some second event has occurred. We illustrate this idea with some 
examples. 


Figure 3.1 E= EF U EF°. EF = Shaded Area; EF* = Striped Area. 


(Part 1) 


An insurance company believes that people can be divided into two classes: those 
who are accident prone and those who are not. The company’s statistics show that 
an accident-prone person will have an accident at some time within a fixed 1-year 
period with probability .4, whereas this probability decreases to .2 for a person who 
is not accident prone. If we assume that 30 percent of the population is accident 
prone, what is the probability that a new policyholder will have an accident within a 
year of purchasing a policy? 


Solution We shall obtain the desired probability by first conditioning upon whether 
or not the policyholder is accident prone. Let A; denote the event that the policy- 
holder will have an accident within a year of purchasing the policy, and let A denote 
the event that the policyholder is accident prone. Hence, the desired probability is 
given by 


P(A)) = P(Aj|A)P(A) + P(A{|ADP(A 
= (.4)(.3) + (.2)(7) = 26 a 


(Part 2) 


Suppose that a new policyholder has an accident within a year of purchasing a policy. 
What is the probability that he or she is accident prone? 
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Solution The desired probability is 


P(AA 
P(A|A}) = we 
_ P(A)P(A1IA) 
P(A)) 
_ (3)(4) 6 a 
~ 26 ~~ 13 


Consider the following game played with an ordinary deck of 52 playing cards: The 
cards are shuffled and then turned over one at a time. At any time, the player can 
guess that the next card to be turned over will be the ace of spades; if it is, then the 
player wins. In addition, the player is said to win if the ace of spades has not yet 
appeared when only one card remains and no guess has yet been made. What is a 
good strategy? What is a bad strategy? 


Solution Every strategy has probability 1/52 of winning! To show this, we will use 
induction to prove the stronger result that for an n card deck, one of whose cards 
is the ace of spades, the probability of winning is 1/n, no matter what strategy is 
employed. Since this is clearly true for n = 1, assume it to be true for ann — 1 
card deck, and now consider an n card deck. Fix any strategy, and let p denote the 
probability that the strategy guesses that the first card is the ace of spades. Given 
that it does, the player’s probability of winning is 1/n. If, however, the strategy does 
not guess that the first card is the ace of spades, then the probability that the player 
wins is the probability that the first card is not the ace of spades, namely, (n — 1)/n, 
multiplied by the conditional probability of winning given that the first card is not 
the ace of spades. But this latter conditional probability is equal to the probability of 
winning when using ann — 1 card deck containing a single ace of spades; it is thus, 
by the induction hypothesis, 1/(n — 1). Hence, given that the strategy does not guess 
the first card, the probability of winning is 


n—-1 1 4 
nn—-1 on 


Thus, letting G be the event that the first card is guessed, we obtain 
1 1 
P{win} = P{win|G}P(G) + P{win|G°}(1 — P(G)) =—p + —(1 — p) 
n n 


In answering a question on a multiple-choice test, a student either knows the answer 
or guesses. Let p be the probability that the student knows the answer and 1 — p 
be the probability that the student guesses. Assume that a student who guesses at 
the answer will be correct with probability 1/m, where m is the number of multiple- 
choice alternatives. What is the conditional probability that a student knew the answer 
to a question given that he or she answered it correctly? 


Solution Let C and K denote, respectively, the events that the student answers the 
question correctly and the event that he or she actually knows the answer. 
Now, 
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P(KC) 


P(K|C) = re 


7 P(C|K)P(K) 
~ P(C|K)P(K) + P(C|K®)P(K®) 
- Pp 

~ p + (/m) — p) 

_ mp 

~ 1+ (m— Vp 


For example, if m = 5, p = 7 then the probability that the student knew the answer 


to a question he or she answered correctly is 2. Oo 


A laboratory blood test is 95 percent effective in detecting a certain disease when it 
is, in fact, present. However, the test also yields a “false positive” result for 1 percent 
of the healthy persons tested. (That is, if a healthy person is tested, then, with prob- 
ability .01, the test result will imply that he or she has the disease.) If .5 percent of 
the population actually has the disease, what is the probability that a person has the 
disease given that the test result is positive? 


Solution Let D be the event that the person tested has the disease and E the event 
that the test result is positive. Then the desired probability is 


P(DE) 


P(D|E) = ae 


P(E|D) PD) 
~ P(E|D)P(D) + P(E|D°)P(D*) 
(.95)(.005) 
~ (.95)(.005) + (.01)(.995) 
eee 323 

~ 294 * 
Thus, only 32 percent of those persons whose test results are positive actually have 
the disease. Many students are often surprised at this result (they expect the per- 
centage to be much higher, since the blood test seems to be a good one), so it is 
probably worthwhile to present a second argument that, although less rigorous than 
the preceding one, is probably more revealing. We now do so. 

Since .5 percent of the population actually has the disease, it follows that, on 
the average, 1 person out of every 200 tested will have it. The test will correctly 
confirm that this person has the disease with probability .95. Thus, on the average, 
out of every 200 persons tested, the test will correctly confirm that .95 person has 
the disease. On the other hand, out of the (on the average) 199 healthy people, the 
test will incorrectly state that (199)(.01) of these people have the disease. Hence, 
for every .95 diseased persons that the test correctly states is ill, there are (on the 
average) (199)(.01) healthy persons who the test incorrectly states are ill. Thus, the 
proportion of time that the test result is correct when it states that a person is ill is 


95 95 
95 + (199)(.01) 294 


= .323 a 


Equation (3.1) is also useful when one has to reassess one’s personal probabili- 
ties in the light of additional information. For instance, consider the examples that 
follow. 
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Consider a medical practitioner pondering the following dilemma: “If I’m at least 80 
percent certain that my patient has this disease, then I always recommend surgery, 
whereas if I’m not quite as certain, then I recommend additional tests that are expen- 
sive and sometimes painful. Now, initially I was only 60 percent certain that Jones 
had the disease, so I ordered the series A test, which always gives a positive result 
when the patient has the disease and almost never does when he is healthy. The test 
result was positive, and I was all set to recommend surgery when Jones informed me, 
for the first time, that he was diabetic. This information complicates matters because, 
although it doesn’t change my original 60 percent estimate of his chances of having 
the disease in question, it does affect the interpretation of the results of the A test. 
This is so because the A test, while never yielding a positive result when the patient 
is healthy, does unfortunately yield a positive result 30 percent of the time in the case 
of diabetic patients who are not suffering from the disease. Now what do I do? More 
tests or immediate surgery?” 


Solution In order to decide whether or not to recommend surgery, the doctor should 
first compute her updated probability that Jones has the disease given that the A test 
result was positive. Let D denote the event that Jones has the disease and EF the event 
that the A test result is positive. The desired conditional probability is then 


P(DE) 
P(E) 


P(D|E) = 


7 P(D)P(E|D) 

~ P(E|D)P(D) + P(E|D°)P(D*) 
7 (.6)1 

~ 1(.6) + (.3)(.4) 

833 


Note that we have computed the probability of a positive test result by condition- 
ing on whether or not Jones has the disease and then using the fact that because 
Jones is a diabetic, his conditional probability of a positive result given that he 
does not have the disease, P(E|D‘°), equals .3. Hence, as the doctor should now 
be more than 80 percent certain that Jones has the disease, she should recommend 
surgery. a 


At a certain stage of a criminal investigation, the inspector in charge is 60 percent 
convinced of the guilt of a certain suspect. Suppose, however, that a new piece of 
evidence which shows that the criminal has a certain characteristic (such as left- 
handedness, baldness, or brown hair) is uncovered. If 20 percent of the population 
possesses this characteristic, how certain of the guilt of the suspect should the inspec- 
tor now be if it turns out that the suspect has the characteristic? 


Solution Letting G denote the event that the suspect is guilty and C the event that 
he possesses the characteristic of the criminal, we have 
P(GC) 
PC) 


P(G|C) = 


= P(C|G)P(G) 
~ P(C|G)P(G) + P(C|G*°)P(G*) 
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7 1(.6) 
~ 1(.6) + (.2)(.4) 


= 882 


where we have supposed that the probability of the suspect having the characteristic 
if he is, in fact, innocent is equal to .2, the proportion of the population possessing 
the characteristic. a 


In the world bridge championships held in Buenos Aires in May 1965, the famous 
British bridge partnership of Terrence Reese and Boris Schapiro was accused of 
cheating by using a system of finger signals that could indicate the number of hearts 
held by the players. Reese and Schapiro denied the accusation, and eventually a 
hearing was held by the British bridge league. The hearing was in the form of a 
legal proceeding with prosecution and defense teams, both having the power to call 
and cross-examine witnesses. During the course of the proceeding, the prosecutor 
examined specific hands played by Reese and Schapiro and claimed that their play- 
ing these hands was consistent with the hypothesis that they were guilty of having 
illicit knowledge of the heart suit. At this point, the defense attorney pointed out 
that their play of these hands was also perfectly consistent with their standard line of 
play. However, the prosecution then argued that as long as their play was consistent 
with the hypothesis of guilt, it must be counted as evidence toward that hypothesis. 
What do you think of the reasoning of the prosecution? 


Solution The problem is basically one of determining how the introduction of new 
evidence (in this example, the playing of the hands) affects the probability of a par- 
ticular hypothesis. If we let H denote a particular hypothesis (such as the hypothesis 
that Reese and Schapiro are guilty) and E the new evidence, then 


P(HE) 


P(H|E) = oa 


_ P(E|H) P(A) 
- P(E|A)P(A) + P(E|H9[1 — P(A)] 


(3.2) 


where P(#) is our evaluation of the likelihood of the hypothesis before the intro- 
duction of the new evidence. The new evidence will be in support of the hypothesis 
whenever it makes the hypothesis more likely—that is, whenever P(H|E) = P(A). 
From Equation (3.2), this will be the case whenever 


P(E|H) = P(E|H)P(A) + P(E|AO[L — PHD] 


or, equivalently, whenever 
P(E\|H) = P(E|H) 


In other words, any new evidence can be considered to be in support of a particular 
hypothesis only if its occurrence is more likely when the hypothesis is true than when 
it is false. In fact, the new probability of the hypothesis depends on its initial proba- 
bility and the ratio of these conditional probabilities, since, from Equation (3.2), 


P(H|E) = my 


P(EIH 


PUY 4 [1h = PUD ce 
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Hence, in the problem under consideration, the play of the cards can be con- 
sidered to support the hypothesis of guilt only if such play would have been more 
likely if the partnership were cheating than if it were not. As the prosecutor never 
made this claim, his assertion that the evidence is in support of the guilt hypothesis 
is invalid. O 


Twins can be either identical or fraternal. Identical, also called monozygotic, twins 
form when a single fertilized egg splits into two genetically identical parts. Con- 
sequently, identical twins always have the same set of genes. Fraternal, also called 
dizygotic, twins develop when two eggs are fertilized and implant in the uterus. The 
genetic connection of fraternal twins is no more or less the same as siblings born at 
separate times. A Los Angeles County, California, scientist wishing to know the cur- 
rent fraction of twin pairs born in the county that are identical twins has assigned a 
county statistician to study this issue. The statistician initially requested each hospital 
in the county to record all twin births, indicating whether or not the resulting twins 
were identical. The hospitals, however, told her that to determine whether newborn 
twins were identical was not a simple task, as it involved the permission of the twins’ 
parents to perform complicated and expensive DNA studies that the hospitals could 
not afford. After some deliberation, the statistician just asked the hospitals for data 
listing all twin births along with an indication as to whether the twins were of the 
same sex. When such data indicated that approximately 64 percent of twin births 
were same-sexed, the statistician declared that approximately 28 percent of all twins 
were identical. How did she come to this conclusion? 


Solution The statistician reasoned that identical twins are always of the same sex, 
whereas fraternal twins, having the same relationship to each other as any pair of 
siblings, will have probability 1/2 of being of the same sex. Letting J be the event 
that a pair of twins is identical, and SS be the event that a pair of twins is of the same 
sex, she computed the probability P(SS) by conditioning on whether the twin pair 
was identical. This gave 


P(SS) = P(SS|DPU) + P(SSUI9PU*) 
or 
1 1 1 
P(SS) =1 X PU) + 5 x [1 — PD] = 5 + 5 PO 
which, using that P(SS) ~ .64 yielded the result 
PU) & .28 O 
The change in the probability of a hypothesis when new evidence is introduced 


can be expressed compactly in terms of the change in the odds of that hypothesis, 
where the concept of odds is defined as follows. 


Definition 

The odds of an event A are defined by 
P(A) P(A) 
P(A) 1 — P(A) 


Example 
3i 
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That is, the odds of an event A tell how much more likely it is that the event A 
occurs than it is that it does not occur. For instance, if P(A) = i, then P(A) = 
2P(A‘), so the odds are 2. If the odds are equal to a, then it is common to say 
that the odds are “a to 1” in favor of the hypothesis. 


Consider now a hypothesis H that is true with probability P(H), and suppose 
that new evidence EF is introduced. Then, the conditional probabilities, given the 
evidence E£, that H is true and that H is not true are respectively given by 

P(E|H) P(A) P(E|H P(A) 


Therefore, the new odds after the evidence EF has been introduced are 


P(H\E) _ P(H) P(E|H) 
P(H°\E) PCH) P(E\H®°) 


(3.3) 


That is, the new value of the odds of H is the old value multiplied by the ratio of the 
conditional probability of the new evidence given that H is true to the conditional 
probability given that H is not true. Thus, Equation (3.3) verifies the result of Exam- 
ple 3f, since the odds, and thus the probability of H, increase whenever the new evi- 
dence is more likely when H is true than when it is false. Similarly, the odds decrease 
whenever the new evidence is more likely when H is false than when it is true. 


An urn contains two type A coins and one type B coin. When a type A coin is flipped, 
it comes up heads with probability 1/4, whereas when a type B coin is flipped, it 
comes up heads with probability 3/4. A coin is randomly chosen from the urn and 
flipped. Given that the flip landed on heads, what is the probability that it was a type 
A coin? 


Solution Let A be the event that a type A coin was flipped, and let B = A‘ be the 
event that a type B coin was flipped. We want P(A|heads), where heads is the event 
that the flip landed on heads. From Equation (3.3), we see that 


P(Al|heads) P(A) P(heads|A) 
P(A‘|heads)  P(B) P(heads|B) 
2/3 1/4 
~ 1/3 3/4 
2/3 


Hence, the odds are 2/3 : 1, or, equivalently, the probability is 2/5 that a type A coin 
was flipped. val 


Equation (3.1) may be generalized as follows: Suppose that Fj, F2,...,F, are 
mutually exclusive events such that 


n 
Linas 
i=1 


In other words, exactly one of the events F,, F2,..., F, must occur. By writing 


n 
E=|JEF 
i=l 
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and using the fact that the events EF;,i = 1,...,n are mutually exclusive, we obtain 


P(E) = x, P(EF;) 
i=1 


= >) PEF) PF) (3.4) 
i=1 


Thus, Equation (3.4), often referred to as the law of total probability, shows how, 
for given events F,, F2,..., Fn, of which one and only one must occur, we can com- 
pute P(E) by first conditioning on which one of the F; occurs. That is, Equation (3.4) 
states that P(E) is equal to a weighted average of P(E|F;), each term being weighted 
by the probability of the event on which it is conditioned. 


In Example 5j of Chapter 2, we considered the probability that, for a randomly shuf- 
fled deck, the card following the first ace is some specified card, and we gave a 
combinatorial argument to show that this probability is a: Here is a probabilistic 
argument based on conditioning: Let E be the event that the card following the first 
ace is some specified card, say, card x. To compute P(E), we ignore card x and con- 
dition on the relative ordering of the other 5/ cards in the deck. Letting O be the 
ordering gives 
P(E) =) P(E|O)P(O) 
O 


Now, given O, there are 52 possible orderings of the cards, corresponding to having 
card x being the ith card in the deck, i = 1,...,52. But because all 52! possible order- 
ings were initially equally likely, it follows that, conditional on O, each of the 52 
remaining possible orderings is equally likely. Because card x will follow the first ace 
for only one of these orderings, we have P(E|O) = 1/52, implying that P(E) = 1/52. 
O 
Again, let F',...,F, be a set of mutually exclusive and exhaustive events (mean- 
ing that exactly one of these events must occur). 
Suppose now that F has occurred and we are interested in determining which 
one of the Fj also occurred. Then, by Equation (3.4), we have the following proposi- 
tion. 


Proposition 
3.1 


P(EF)) 
P(E) 
_ P(E|F)) PCF) (3.5) 


> PEIF)PERD 


i=1 


P(F|E) = 


Equation (3.5) is known as Bayes’s formula, after the English philosopher Thomas 
Bayes. If we think of the events Fj as being possible “hypotheses” about some sub- 
ject matter, then Bayes’s formula may be interpreted as showing us how opinions 
about these hypotheses held before the experiment was carried out [that is, the 
P(F;)] should be modified by the evidence produced by the experiment. 


Example 
3k 


Example 
31 
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A plane is missing, and it is presumed that it was equally likely to have gone down in 
any of 3 possible regions. Let 1 — f;, i = 1, 2, 3, denote the probability that the plane 
will be found upon a search of the ith region when the plane is, in fact, in that region. 
(The constants 6; are called overlook probabilities, because they represent the prob- 
ability of overlooking the plane; they are generally attributable to the geographical 
and environmental conditions of the regions.) What is the conditional probability 
that the plane is in the ith region given that a search of region 1 is unsuccessful? 


Solution Let Rj, i = 1, 2, 3, be the event that the plane is in region i, and let E be 
the event that a search of region 1 is unsuccessful. From Bayes’s formula, we obtain 


P(ER}) 
P(E) 
P(E|R1)P(R1) 

3 
>> PEEIRi) P(Ri) 


i=1 


P(R\|E) = 


_ (Bi) 3 
(B13 + (D3 + 3 


= By 
Bi + 2 


For j = 2, 3, 
P(E|R))P(R)) 
P(E) 
_ (1)3 
(63 +544 


P(R||E) = 


= ——__ j=2,3 
bie? 

Note that the updated (that is, the conditional) probability that the plane is in 
region j, given the information that a search of region 1 did not find it, is greater than 
the initial probability that it was in region j when j # 1 and is less than the initial prob- 
ability when j = 1. This statement is certainly intuitive, since not finding the plane 
in region 1 would seem to decrease its chance of being in that region and increase 
its chance of being elsewhere. Further, the conditional probability that the plane is 
in region 1 given an unsuccessful search of that region is an increasing function of 
the overlook probability 6;. This statement is also intuitive, since the larger j is, the 
more it is reasonable to attribute the unsuccessful search to “bad luck” as opposed 
to the plane’s not being there. Similarly, P(R;|E),j # 1, is a decreasing function of 
By. a 


The next example has often been used by unscrupulous probability students to 
win money from their less enlightened friends. 


Suppose that we have 3 cards that are identical in form, except that both sides of the 
first card are colored red, both sides of the second card are colored black, and one 
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side of the third card is colored red and the other side black. The 3 cards are mixed 
up in a hat, and 1 card is randomly selected and put down on the ground. If the upper 
side of the chosen card is colored red, what is the probability that the other side is 
colored black? 


Solution Let RR, BB, and RB denote, respectively, the events that the chosen card 
is all red, all black, or the red—black card. Also, let R be the event that the upturned 
side of the chosen card is red. Then, the desired probability is obtained by 


P(RB OR 
P(RB|R) = Se 
P(R|RB)P(RB) 


~ P(RIRR)P(RR) + P(R|RB)P(RB) + P(R|BB)P(BB) 
_ (3) (3) 
OOOO? 


Hence, the answer is ;- Some students guess 5 as the answer by incorrectly reasoning 
that given that a red side appears, there are two equally likely possibilities: that the 
card is the all-red card or the red—black card. Their mistake, however, is in assuming 
that these two possibilities are equally likely. For, if we think of each card as con- 
sisting of two distinct sides, then we see that there are 6 equally likely outcomes of 
the experiment—namely, R1, Ro, B,, Bz, R3, B3 where the outcome is R, if the first 
side of the all-red card is turned face up, R2 if the second side of the all-red card 
is turned face up, R3 if the red side of the red—black card is turned face up, and so 
on. Since the other side of the upturned red side will be black only if the outcome is 
R3, we see that the desired probability is the conditional probability of R3 given that 
either Ry or Ro or R3 occurred, which obviously equals ;- O 


A new couple, known to have two children, has just moved into town. Suppose that 
the mother is encountered walking with one of her children. If this child is a girl, 
what is the probability that both children are girls? 


Solution Let us start by defining the following events: 


G;: the first (that is, the oldest) child is a girl. 
G2: the second child is a girl. 
G: the child seen with the mother is a girl. 


Also, let B;, Bz, and B denote similar events, except that “girl” is replaced by “boy.” 
Now, the desired probability is P(G,G2|G), which can be expressed as 
follows: 


P(G,GoG 
P(G1G2|G) = “oe 


_ P(G1G2) 
~  P(G) 
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Also, 
P(G) = P(G|G1G2)P(G1 G2) + P(G|G;B2)P(G1 Bz) 
+ P(G|B1G2)P(B1G2) + P(G|B,B2)P(B1B2) 
= P(G{G2) + P(G|G;B2)P(G1B2) + P(G|B,G2)P(BiG2) 


where the final equation used the results P(G|G; G2) = 1 and P(G|B, Bz) = 0. If we 
now make the usual assumption that all 4 gender possibilities are equally likely, then 
we see that 


1 


P(G1G2|G) = 5 z 
4 + P(G|G,B2)/4 + P(G|B,G2)/4 


1 
~ 1+ P(G|G\B2) + P(G|B\G2) 


Thus, the answer depends on whatever assumptions we want to make about the con- 
ditional probabilities that the child seen with the mother is a girl given the event 
GB» and that the child seen with the mother is a girl given the event G2B,. For 
instance, if we want to assume, on the one hand, that, independently of the gen- 
ders of the children, the child walking with the mother is the elder child with some 
probability p, then it would follow that 


P(G|G,B2) = p =1 — P(G|B\G2) 


implying under this scenario that 
1 
{G1 G2|G) =, 


If, on the other hand, we were to assume that if the children are of different genders, 
then the mother would choose to walk with the girl with probability g, independently 
of the birth order of the children, then we would have 


P(G|G1B2) = P(G|B,G2) = q 
implying that 


P(G1G|G) = I+ 2q 
For instance, if we took g = 1, meaning that the mother would always choose to walk 
with a daughter, then the conditional probability that she has two daughters would 
be 7 which is in accord with Example 2b because seeing the mother with a daughter 
is now equivalent to the event that she has at least one daughter. 

Hence, as stated, the problem is incapable of solution. Indeed, even when the 
usual assumption about equally likely gender probabilities is made, we still need to 
make additional assumptions before a solution can be given. This is because the sam- 
ple space of the experiment consists of vectors of the form sj, 52, i, where s, is the 
gender of the older child, sz is the gender of the younger child, and 7 identifies the 
birth order of the child seen with the mother. As a result, to specify the probabilities 
of the events of the sample space, it is not enough to make assumptions only about 
the genders of the children; it is also necessary to assume something about the con- 
ditional probabilities as to which child is with the mother given the genders of the 
children. O 
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A bin contains 3 types of disposable flashlights. The probability that a type 1 flash- 
light will give more than 100 hours of use is .7 with the corresponding probabilities 
for type 2 and type 3 flashlights being .4 and .3, respectively. Suppose that 20 per- 
cent of the flashlights in the bin are type 1, 30 percent are type 2, and 50 percent are 


type 3. 


(a) What is the probability that a randomly chosen flashlight will give more than 
100 hours of use? 


(b) Given that a flashlight lasted more than 100 hours, what is the conditional prob- 
ability that it was a type j flashlight, j = 1,2,3? 


Solution (a) Let A denote the event that the flashlight chosen will give more than 
100 hours of use, and let Fj be the event that a type j flashlight is chosen, j = 1,2, 3. 
To compute P(A), we condition on the type of the flashlight, to obtain 


P(A) = P(A|Fi)P(F1) + P(AlF2)P(F2) + P(ALF3) PP) 
= (.7)(.2) + (4)(03) + (.3)(5) = 41 
There is a 41 percent chance that the flashlight will last for more than 100 hours. 
(b) The probability is obtained by using Bayes’s formula: 
P(AF)) 
P(A) 
_ P(AIF) PR) 
7 41 


P(F;i|A) = 


Thus, 


P(F,|A) = (.7)(.2)/.41 = 14/41 
P(F>|A) = (.4)(.3)/.41 = 12/41 
P(F3|A) = (.3)(.5)/.41 = 15/41 


For instance, whereas the initial probability that a type 1 flashlight is chosen is only 
.2, the information that the flashlight has lasted more than 100 hours raises the prob- 
ability of this event to 14/41 ~ .341. | 


A crime has been committed by a solitary individual, who left some DNA at the 
scene of the crime. Forensic scientists who studied the recovered DNA noted that 
only five strands could be identified and that each innocent person, independently, 
would have a probability of 10~> of having his or her DNA match on all five strands. 
The district attorney supposes that the perpetrator of the crime could be any of the 
1 million residents of the town. Ten thousand of these residents have been released 
from prison within the past 10 years; consequently, a sample of their DNA is on file. 
Before any checking of the DNA file, the district attorney thinks that each of the 
10,000 ex-criminals has probability a of being guilty of the new crime, whereas each 
of the remaining 990,000 residents has probability 6, where a = cf. (That is, the 
district attorney supposes that each recently released convict is c times as likely to 
be the crime’s perpetrator as is each town member who is not a recently released 
convict.) When the DNA that is analyzed is compared against the database of the 
10,000 ex-convicts, it turns out that A. J. Jones is the only one whose DNA matches 
the profile. Assuming that the district attorney’s estimate of the relationship between 
a and £ is accurate, what is the probability that A. J. is guilty? 
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Solution To begin, note that because probabilities must sum to 1, we have 
1 = 10,000a + 990,0006 = (10,000c + 990,000)8 


Thus, 
1 Cc 


~ 10,000¢ + 990,000’ ~~ 10,000c + 990,000 


B 


Now, let G be the event that A. J. is guilty, and let M denote the event that A. J. is 
the only one of the 10,000 on file to have a match. Then, 


P(GM) 


P(G|M) = nap 


7 P(G)P(M|G) 
~ P(M|G)P(G) + P(M|G)P(G*) 


On the one hand, if A. J. is guilty, then he will be the only one to have a DNA match 
if none of the others on file have a match. Therefore, 


P(M|G) = (1 — 107°)999? 


On the other hand, if A. J. is innocent, then in order for him to be the only match, his 
DNA must match (which will occur with probability 10~>), all others in the database 
must be innocent, and none of these others can have a match. Now, given that A. J. 
is innocent, the conditional probability that all the others in the database are also 
innocent is 


P(all in database innocent) 
P(AJ innocent) 
_ 1 — 10,000a 


1l—-—a 


P(all others innocent|AJ innocent) = 


Also, the conditional probability, given their innocence, that none of the others in 
the database will have a match is (1 — 10~>)999°. Therefore, 


1 — 10,000 
P(M\G*) = 10> : 1. = 10°)? 
(M|G*) ( —— \c ) 


Because P(G) = a, the preceding formula gives 


a 1 
a + 10-9(1 — 10,000e) 9 4 105 


a 


P(G|M) = 


Thus, if the district attorney’s initial thoughts were that an arbitrary ex-convict was 
100 times more likely to have committed the crime than was a nonconvict (that is, 


c = 100), then « = y4qq and 
P(G|M) = a 0.9099 
~ 1,099 ~ 
If the district attorney initially thought that the appropriate ratio was c = 10, then 
1 


” = 79,000 274 


1 
P(GIM) = 55 © 0.5025 
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If the district attorney initially thought that the criminal was equally likely to be any 
of the members of the town (c = 1), then a = 10~° and 


1 
P(GIM) = z55 © 0.0917 


Thus, the probability ranges from approximately 9 percent when the district attor- 
ney’s initial assumption is that all the members of the population have the same 
chance of being the perpetrator to approximately 91 percent when she assumes 
that each ex-convict is 100 times more likely to be the criminal than is a specified 
townsperson who is not an ex-convict. a 


3.4 Independent Events 


Example 
4a 


Example 
4b 
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The previous examples in this chapter show that P(E|F), the conditional probability 
of E given F, is not generally equal to P(E), the unconditional probability of E. 
In other words, knowing that F has occurred generally changes the chances of E’s 
occurrence. In the special cases where P(E|F) does in fact equal P(E), we say that FE 
is independent of F. That is, EF is independent of F if knowledge that F has occurred 
does not change the probability that E occurs. 

Since P(E|F) = P(EF)/P(P), it follows that E is independent of F if 


P(EF) = P(E)P(F) (4.1) 


The fact that Equation (4.1) is symmetric in E and F shows that whenever E is inde- 
pendent of F, F is also independent of E. We thus have the following definition. 


Definition 


Two events E and F are said to be independent if Equation (4.1) holds. 
Two events E and F that are not independent are said to be dependent. 


A card is selected at random from an ordinary deck of 52 playing cards. If E is the 
event that the selected card is an ace and F is the event that it is a spade, then EF 
and F are independent. This follows because P(EF) = > whereas P(E) = o and 
P(F) = §. | 


Two coins are flipped, and all 4 outcomes are assumed to be equally likely. If E is 
the event that the first coin lands on heads and F the event that the second lands 
on tails, then E and F are independent, since P(EF) = P({(H,T)}) = i whereas 
P(E) = P(((H, H), (HH, T)}) = 5 and P(F) = P((H, 1), (T, T)}) = 5. 7 
Suppose that we toss 2 fair dice. Let E; denote the event that the sum of the dice is 
6 and F denote the event that the first die equals 4. Then 


1 
PEE = Ce a= ae 


> 1 5 
rev =(8)(2)= 3, 


whereas 


Example 
4d 


Proposition 
4.1 
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Hence, F; and F are not independent. Intuitively, the reason for this is clear because 
if we are interested in the possibility of throwing a 6 (with 2 dice), we shall be quite 
happy if the first die lands on 4 (or, indeed, on any of the numbers 1, 2, 3, 4, and 5), 
for then we shall still have a possibility of getting a total of 6. If, however, the first 
die landed on 6, we would be unhappy because we would no longer have a chance 
of getting a total of 6. In other words, our chance of getting a total of 6 depends on 
the outcome of the first die; thus, E, and F cannot be independent. 

Now, suppose that we let E2 be the event that the sum of the dice equals 7. Is E> 
independent of F’? The answer is yes, since 


1 
P(E2F) = PK(4,3)) = 36 


1\ /1 1 
rennin =()()=3 


We leave it for the reader to present the intuitive argument why the event that 
the sum of the dice equals 7 is independent of the outcome on the first die. | 


whereas 


If we let EF denote the event that the next president is a Republican and F the event 
that there will be a major earthquake within the next year, then most people would 
probably be willing to assume that E and F are independent. However, there would 
probably be some controversy over whether it is reasonable to assume that E is 
independent of G, where G is the event that there will be a recession within two 
years after the election. a 


We now show that if E is independent of F, then E is also independent of F°. 


If E and F are independent, then so are E and F°. 
Proof Assume that E and F are independent. Since E = EF U EF and EF and EF® 
are obviously mutually exclusive, we have 
P(E) = P(EF) + P(EF*‘) 
= P(E)P(F) + P(EF*) 
or, equivalently, 


P(EF*) = P(E)[1 — P(F)] 
— P(E)P(F*) 


and the result is proved. 


Thus, if F is independent of F, then the probability of E’s occurrence is unchanged 
by information as to whether or not F has occurred. 

Suppose now that E is independent of F and is also independent of G. Is E 
then necessarily independent of FG? The answer, somewhat surprisingly, is no, as 
the following example demonstrates. 


Two fair dice are thrown. Let E denote the event that the sum of the dice is 7 Let F 
denote the event that the first die equals 4 and G denote the event that the second 
die equals 3. From Example 4c, we know that EF is independent of F, and the same 
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reasoning as applied there shows that E is also independent of G; but clearly, E is 
not independent of FG [since P(E|FG) = 1]. 


It would appear to follow from Example 4e that an appropriate definition of the 
independence of three events E, F, and G would have to go further than merely 


assuming that all of the ( : pairs of events are independent. We are thus led to the 


following definition. 


Definition 
Three events F, F, and G are said to be independent if 
P(EFG) = P(E)P(F)P(G) 
P(EF) = P(E)P(F) 
P(EG) = P(E)P(G) 
P(FG) = P(F)P(G) 


Note that if EF, F, and G are independent, then E will be independent of any 
event formed from F and G. For instance, FE is independent of F U G, since 


P[E(F U G)] = P(EF U EG) 
= P(EF) + P(EG) — P(EFG) 
= P(E)P(F) + P(E)P(G) — P(E)P(FG) 
= P(E)[P(F) + P(G) — P(FG)| 
= P(E)P(F U G) 


Of course, we may also extend the definition of independence to more than 
three events. The events £1, E2,..., Ey are said to be independent if for every subset 
Ey, Ey,...,Ey,r = n of these events, 


P(Ey Ey --- Ey) = PCE) P(E) --- PEY) 


Finally, we define an infinite set of events to be independent if every finite subset of 
those events is independent. 

Sometimes, a probability experiment under consideration consists of performing 
a sequence of subexperiments. For instance, if the experiment consists of continually 
tossing a coin, we may think of each toss as being a subexperiment. In many cases, 
it is reasonable to assume that the outcomes of any group of the subexperiments 
have no effect on the probabilities of the outcomes of the other subexperiments. If 
such is the case, we say that the subexperiments are independent. More formally, 
we say that the subexperiments are independent if F), E2,..., En,... is necessarily 
an independent sequence of events whenever £; is an event whose occurrence is 
completely determined by the outcome of the ith subexperiment. 

If each subexperiment has the same set of possible outcomes, then the subex- 
periments are often called trials. 


An infinite sequence of independent trials is to be performed. Each trial results in a 
success with probability p and a failure with probability 1 — p. What is the proba- 
bility that 


(a) at least 1 success occurs in the first 7 trials; 
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(b) exactly & successes occur in the first 7 trials; 
(c) all trials result in successes? 


Solution In order to determine the probability of at least 1 success in the first n 
trials, it is easiest to compute first the probability of the complementary event: that 
of no successes in the first 7 trials. If we let E; denote the event of a failure on the ith 
trial, then the probability of no successes is, by independence, 


P(E\ E--- En) = PUE1)P(E2)--- P(En) = (1 — p)” 


Hence, the answer to part (a) is 1 — (1 — p)”. 

To compute the answer to part (b), consider any particular sequence of the first 
n outcomes containing k successes and — k failures. Each one of these sequences 
will, by the assumed independence of trials, occur with probability p‘(1 — p)’~*. 
Since there are : such sequences [there are n!/k!(n — k)! permutations of k 


successes andn — k failures], the desired probability in part (b) is 
P{exactly k successes} = ( 4 oi ape 


To answer part (c), we note that, by part (a), the probability of the first 1 trials 
all resulting in success is given by 
P(E\ ES: E,) =p" 


Thus, using the continuity property of probabilities (Section 2.6), we see that the 
desired probability is given by 


[oe n 
(Aes) =» ( tin CV 
n [oe 
i=1 i=1 


= lim P| ()Ef 
i=1 
: 0 ifp <1 
= A __ 
Wat ad - 


A system composed of n separate components is said to be a parallel system if it 
functions when at least one of the components functions. (See Figure 3.2.) For such 
a system, if component i, which is independent of the other components, functions 
with probability p;,i = 1,...,, what is the probability that the system functions? 


Solution Let A; denote the event that component i functions. Then, 


P{system functions} = 1 — P{system does not function} 


= 1 — Pf{all components do not function} 


=1-P\()A§ 
i 


n 
=1- [a — pi) by independence a 
i=l 
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n 


Figure 3.2 Parallel System: Functions if Current Flows from A to B. 


Independent trials consisting of rolling a pair of fair dice are performed. What is the 
probability that an outcome of 5 appears before an outcome of 7 when the outcome 
of a roll is the sum of the dice? 


Solution If we let E,, denote the event that no 5 or 7 appears on the first — | trials 
and a 5 appears on the nth trial, then the desired probability is 


P |) En =) P(En) 
n=1 n=1 


Now, since P{5 on any trial} = and P{7 on any trial} = x, we obtain, by the 
independence of trials, 
oy 4 
P(E,) =(1 — — 
(En) ( =) 36 
Thus, 
fa ieapiay 
P E, |) == — 
Us 5 (z) 
n=1 n=1 
a ee 
~ 9 13 
04 = - 
_2 
5 


This result could also have been obtained by the use of conditional probabilities. 
If we let E be the event that a 5 occurs before a 7 then we can obtain the desired 
probability, P(E), by conditioning on the outcome of the first trial, as follows: Let 
F be the event that the first trial results in a 5, let G be the event that it results in 
a 7 and let H be the event that the first trial results in neither a 5 nor a 7. Then, 
conditioning on which one of these events occurs gives 


P(E) = P(E|F)P(P) + P(E|G)P(G) + P(E|A)PA) 
However, 


P(E|F) =1 
P(E|G) =0 
P(E|H) = P(E) 
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The first two equalities are obvious. The third follows because if the first outcome 
results in neither a 5 nor a 7 then at that point the situation is exactly as it was when 
the problem first started—namely, the experimenter will continually roll a pair of fair 
dice until either a 5 or 7 appears. Furthermore, the trials are independent; therefore, 
the outcome of the first trial will have no effect on subsequent rolls of the dice. Since 
P(F) = %,P(G) = &, and P(H) = %, it follows that 


1 13 
PE) =5 + PZ 


or 


2 
P(E) = = 


The reader should note that the answer is quite intuitive. That is, because a 5 
occurs on any roll with probability s and a 7 with probability x, it seems intuitive 
that the odds that a 5 appears before a 7 should be 6 to 4 against. The probability 


should then be > as indeed it is. 


The same argument shows that if E and F are mutually exclusive events of an 
experiment, then, when independent trials of the experiment are performed, the 


event E will occur before the event F with probability 


P(E) 


—_—_—__ a 
PE) + PW) 


Suppose there are n types of coupons and that each new coupon collected is, inde- 
pendent of previous selections, a type i coupon with probability p;, i, pi = 1. 
Suppose k coupons are to be collected. If A; is the event that there is at least one 
type i coupon among those collected, then, for i # j, find 


(a) P(Aj) 
(b) P(Aj U Aj) 
(ce) P(Aj|Aj) 


Solution 
P(Aj) =1 — P(Aj) 
= 1 — P{no coupon is type i} 
=1-(d-— pi" 
where the preceding used that each coupon is, independently, not of type i with prob- 
ability 1 — p;. Similarly, 
P(A; U Aj) =1 - P((Ai U Aj)‘ 
= 1 — P{no coupon is either type i or type j} 
Sf = pps py" 


where the preceding used that each coupon is, independently, neither of type i nor 
type j with probability 1 — p; — pj. 
To determine P(Aj|Aj), we will use the identity 
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which, in conjunction with parts (a) and (b), yields 


PAA) =1=- 0 = p)* +1-0—p)* =f - 0d =p = pp] 
=1-(-p)* - a —p)* +d - pi - p/* 


Consequently, 


P(A;A) 1-0 —pd*- A —- pp + — pi — pp 


= | 


P(Aj|Aj) = 


The next example presents a problem that occupies an honored place in the his- 
tory of probability theory. This is the famous problem of the points. In general terms, 
the problem is this: Two players put up stakes and play some game, with the stakes 
to go to the winner of the game. An interruption requires them to stop before either 
has won and when each has some sort of a “partial score.” How should the stakes be 
divided? 

This problem was posed to the French mathematician Blaise Pascal in 1654 by 
the Chevalier de Méré, who was a professional gambler at that time. In attacking 
the problem, Pascal introduced the important idea that the proportion of the prize 
deserved by the competitors should depend on their respective probabilities of win- 
ning if the game were to be continued at that point. Pascal worked out some special 
cases and, more importantly, initiated a correspondence with the famous French- 
man Pierre de Fermat, who had a reputation as a great mathematician. The resulting 
exchange of letters not only led to a complete solution to the problem of the points, 
but also laid the framework for the solution to many other problems connected with 
games of chance. This celebrated correspondence, considered by some as the birth 
date of probability theory, was also important in stimulating interest in probability 
among the mathematicians in Europe, for Pascal and Fermat were both recognized 
as being among the foremost mathematicians of the time. For instance, within a short 
time of their correspondence, the young Dutch mathematician Christiaan Huygens 
came to Paris to discuss these problems and solutions, and interest and activity in 
this new field grew rapidly. 


The problem of the points 


Independent trials resulting in a success with probability p and a failure with proba- 
bility 1 — p are performed. What is the probability that n successes occur before m 
failures? If we think of A and B as playing a game such that A gains 1 point when a 
success occurs and B gains 1 point when a failure occurs, then the desired probability 
is the probability that A would win if the game were to be continued in a position 
where A needed n and B needed m more points to win. 


Solution We shall present two solutions. The first comes from Pascal and the second 
from Fermat. 

Let us denote by Py the probability that n successes occur before m failures. 
By conditioning on the outcome of the first trial, we obtain 


Pram = PPn—14m +d — P)Pnm-1 n21m=2 1 
(Why? Reason it out.) Using the obvious boundary conditions Py9 = 0,Po,m = 1, 


we can solve these equations for P;. Rather than go through the tedious details, 
let us instead consider Fermat’s solution. 
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Fermat argued that in order for 1 successes to occur before m failures, it is nec- 
essary and sufficient that there be at least n successes in the first m + n — 1 trials. 
(Even if the game were to end before a total of m + n — 1 trials were completed, we 
could still imagine that the necessary additional trials were performed.) This is true, 
for if there are at least n successes in the first m + n — 1 trials, there could be at 
most m — 1 failures in those m + n — 1 trials; thus, n successes would occur before 
m failures. If, however, there were fewer than n successes in the first m + n — 1 
trials, there would have to be at least m failures in that same number of trials; thus, 
n successes would not occur before m failures. 

Hence, since, as shown in Example 4f, the probability of exactly k successes in 


m +n — I trials is . “ = Je — py"tn-l-k it follows that the desired 
probability of m successes before m failures is 
m+n—1 a | 
Prim = > ( k Je _ ae a 
k=n 


The following example gives another instance where determining the probability 
that a player wins a match is made easier by assuming that the play continues even 
after the match winner has been determined. 


Service protocol in a serve and rally game 


Consider a serve and rally match (such as volleyball, badminton, or squash) between 
two players, A and B. The match consists of a sequence of rallies, with each rally 
beginning with a serve by one of the players and continuing until one of the players 
has won the rally. The winner of the rally receives a point, and the match ends when 
one of the players has won a total of n points, with that player being declared the 
winner of the match. Suppose whenever a rally begins with A as the server, that A 
wins that rally with probability p,4 and that B wins it with probability gg =1 — pa, 
and that a rally that begins with B as the server is won by A with probability pg and 
by B with probability gg = 1 — pz. Player A is to be the initial server. There are 
two possible server protocols that are under consideration: “winner serves,” which 
means that the winner of a rally is the server for the next rally, or “alternating serve,” 
which means that the server alternates from rally to rally, so that no two consecutive 
rallies have the same server. Thus, for instance, if m = 3, then the successive servers 
under the “winner serves” protocol would be A, A, B, A, A if A wins the first point, 
then B the next, then A wins the next two. On the other hand, the sequence of servers 
under the “alternating serve” protocol will always be A, B, A, B, A,... until the match 
winner is decided. If you were player A, which protocol would you prefer? 


Solution Surprisingly, it turns out that it makes no difference, in that the probability 
that A is the match winner is the same under either protocol. To show that this is 
the case, it is advantageous to suppose that the players continue to play until a total 
of 2n — 1 rallies have been completed. The first player to win n rallies would then 
be the one who has won at least n of the 2n — 1 rallies. To begin, note that if the 
alternating serve protocol is being used, then player A will serve exactly n times and 
player B will serve exactly n — 1 times in the 2n — 1 rallies. 

Now consider the winner serve protocol, again assuming that the players con- 
tinue to play until 2n — 1 rallies have been completed. Because it makes no differ- 
ence who serves the “extra rallies” after the match winner has been decided, suppose 
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that at the point at which the match has been decided (because one of the players 
has won n points), the remainder (if there are any) of the 2m — 1 rallies are all served 
by the player who lost the match. Note that this modified service protocol does not 
change the fact that the winner of the match will still be the player who wins at least 
n of the 2n — 1 rallies. We claim that under this modified service protocol, A will 
always serve 7 times and B will always serve n — 1 times. Two cases show this. 


Case 1: A wins the match. 

Because A serves first, it follows that A’s second serve will immediately follow A’s 
first point; A’s third serve will immediately follow A’s second point; and, in particular, 
A’s nth serve will immediately follow A’s (n — 1) point. But this will be the last serve 
of A before the match result is decided. This is so because either A will win the point 
on that serve and so have n points, or A will lose the point and so the serve will 
switch to, and remain with, B until A wins point number n. Thus, provided that A 
wins the match, it follows that A would have served a total of n times at the moment 
the match is decided. Because, by the modified service protocol, A will never again 
serve, it follows in this case that A serves exactly n times. 


Case 2: B wins the match. 

Because A serves first, B’s first serve will come immediately after B’s first point; B’s 
second serve will come immediately after B’s second point; and, in particular, B’s 
(n — 1) serve will come immediately after B’s (n — 1) point. But that will be the 
last serve of B before the match is decided because either B will win the point on 
that serve and so have n points, or B will lose the point and so the serve will switch 
to, and remain with, A until B wins point number n. Thus, provided that B wins the 
match, we see that B would have served a total of n — 1 times at the moment the 
match is decided. Because, by the modified service protocol, B will never again serve, 
it follows in this case that B serves exactly n — 1 times, and, as there are a total of 
2n — 1 rallies, that A serves exactly n times. 

Thus, we see that under either protocol, A will always serve n times and B will 
serve n — 1 times and the winner of the match will be the one who wins at least 
n points. But since A wins each rally that he serves with probability p4 and wins 
each rally that B serves with probability pg it follows that the probability that A is 
the match winner is, under either protocol, equal to the probability that there are at 
least n successes in 2n — 1 independent trials, when n of these trials result in a success 
with probability py and the other — 1 trials result in a success with probability pz. 
Consequently, the win probabilities for both protocols are the same. O 


Our next two examples deal with gambling problems, with the first having a 
surprisingly elegant analysis.* 


Suppose that initially there are r players, with player i having n; units, n; > 0,7 = 
1,...,7. At each stage, two of the players are chosen to play a game, with the winner 
of the game receiving 1 unit from the loser. Any player whose fortune drops to 0 is 
eliminated, and this continues until a single player has all n = )~/_, nj; units, with 
that player designated as the victor. Assuming that the results of successive games 
are independent and that each game is equally likely to be won by either of its two 
players, find P;, the probability that player iis the victor. 


Solution To begin, suppose that there are n players, with each player initially having 
1 unit. Consider player i. Each stage she plays will be equally likely to result in her 
either winning or losing 1 unit, with the results from each stage being independent. 


*The remainder of this section should be considered optional. 
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In addition, she will continue to play stages until her fortune becomes either 0 or 
n. Because this is the same for all n players, it follows that each player has the same 
chance of being the victor, implying that each player has probability 1/n of being the 
victor. Now, suppose these players are divided into r teams, with team i containing 
n; players, i = 1,...,r. Then, the probability that the victor is a member of team i is 
nj/n. But because 


(a) team / initially has a total fortune of n; units, i = 1,...,r, and 
(b) each game played by members of different teams is equally likely to be won 
by either player and results in the fortune of members of the winning team 


increasing by 1 and the fortune of the members of the losing team decreasing 
by J, 


it is easy to see that the probability that the victor is from team 7 is exactly the prob- 
ability we desire. Thus, P; = n;/n. Interestingly, our argument shows that this result 
does not depend on how the players in each stage are chosen. a 


In the gambler’s ruin problem, there are only 2 gamblers, but they are not assumed 
to be of equal skill. 


The gambler’s ruin problem 


Two gamblers, A and B, bet on the outcomes of successive flips of a coin. On each 
flip, if the coin comes up heads, A collects 1 unit from B, whereas if it comes up tails, 
A pays 1 unit to B. They continue to do this until one of them runs out of money. 
If it is assumed that the successive flips of the coin are independent and each flip 
results in a head with probability p, what is the probability that A ends up with all 
the money if he starts with i units and B starts with N — i units? 


Solution Let E denote the event that A ends up with all the money when he starts 
with i and B starts with N — i, and to make clear the dependence on the initial 
fortune of A, let P; = P(E). We shall obtain an expression for P(E) by conditioning 
on the outcome of the first flip as follows: Let H denote the event that the first flip 
lands on heads; then 


Pj = P(E) = P(E|H) P(A) + PEI) PU) 
= pP(E|H) + (1 — p)P(E|H®) 


Now, given that the first flip lands on heads, the situation after the first bet is that 
A hasi + 1 units and B has N — (i + 1). Since the successive flips are assumed to be 
independent with a common probability p of heads, it follows that from that point 
on, A’s probability of winning all the money is exactly the same as if the game were 
just starting with A having an initial fortune of i + 1 and B having an initial fortune 
of N — @ + 1). Therefore, 
P(E|A) = Pi+1 


and similarly, b= * 
= Fi-1 


Hence, letting g = 1 — p, we obtain 


Pi=pPi44 + qPi1  i=1,2,....N-1 (4.2) 
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By making use of the obvious boundary conditions Po = 0 and Py = 1, we shall 
now solve Equation (4.2). Since p + gq = 1, these equations are equivalent to 


pPi + qPi=pPis1 + qPi-1 


or 
Piss — Pris O(Pi ~ Pit) f= 17, (4.3) 
Hence, since Pp = 0, we obtain, from Equation (4.3), 
Py — Py = 4(P, — Po) = 2P, 
P P 
q q\ 
P3 — Po = -(P2 — Py)= (2) Pi 
P P 
(4.4) 
q q i-1 
Pi — Pj-1 = —(Pi-1 — Pi-2) = (2) Pi 
P 
4 _(4 N-1 
Py — Py-1 = —(Py-1 — Pn-2) =| = Bi 
P P 
Adding the first i — 1 equations of (4.4) yields 
2 i-1 
P= Py = Pj (2) + (2) 5 ee (2) 
P P Pp 
or 
1 _ E 
oP gd oy 
P; 1 — q/p) Pp 
L 
iP, if? 1 
P 
Using the fact that Py = 1, we obtain 
= 
1= GP) ign gt 
p,-) 17 Gp) 
m)4 
: 1 
nN if p =5 
Hence, 
— i 
_ ifp #3 
peat ae (4.5) 


N 


Let Q; denote the probability that B winds up with all the money when A starts 
with i and B starts with N — i. Then, by symmetry to the situation described, and on 


replacing p by g andiby N — i, it follows that 
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1 — N-i 
(P/@) — ifqe} 
O 1 — (p/q) 
N = 1 7 1 
N if g =5 


. 1 . . 1 1 
Moreover, since g = 5 Is equivalent to p = z, we have, when g # ne 


1— (q/py' 1 - @/qX 
1 — (q/p)% 1 — (p/q)N 
_ pN — pN(q/py rm q’ — qN(p/gn 


Pi + = 


pN — qN gN — pN 
_ pN — pN-igi — qgN + qipN-i 
pN — qN 


= 1 
This result also holds when p = q = 5 so 
Pi + Q=1 


In words, this equation states that with probability 1, either A or B will wind 
up with all of the money; in other words, the probability that the game continues 
indefinitely with A’s fortune always being between | and N — 1 is zero. (The reader 
must be careful because, a priori, there are three possible outcomes of this gambling 
game, not two: Either A wins, or B wins, or the game goes on forever with nobody 
winning. We have just shown that this last event has probability 0.) 

As a numerical illustration of the preceding result, if A were to start with 5 units 
and B with 10, then the probability of A’s winning would be i if p were * whereas 


it would jump to 
5 
1 — (2 
1- (@) = 87 


if p were .6. 

A special case of the gambler’s ruin problem, which is also known as the prob- 
lem of duration of play, was proposed to Huygens by Fermat in 1657 The version 
Huygens proposed, which he himself solved, was that A and B have 12 coins each. 
They play for these coins in a game with 3 dice as follows: Whenever 11 is thrown (by 
either—it makes no difference who rolls the dice), A gives a coin to B. Whenever 14 
is thrown, B gives a coin to A. The person who first wins all the coins wins the game. 
Since P{roll 11} = ak and P{roll 14} = a we see from Example 4h that, for A, this 
is just the gambler’s ruin problem with p = ip, i = 12, and N = 24. The general form 
of the gambler’s ruin problem was solved by the mathematician James Bernoulli and 
published 8 years after his death in 1713. 

For an application of the gambler’s ruin problem to drug testing, suppose that 
two new drugs have been developed for treating a certain disease. Drug i has a cure 
rate pi,i = 1,2, in the sense that each patient treated with drug i will be cured with 
probability p;. These cure rates are, however, not known, and we are interested in 
finding a method for deciding whether p; > p2 or p2 > p;. To decide on one of these 
alternatives, consider the following test: Pairs of patients are to be treated sequen- 
tially, with one member of the pair receiving drug 1 and the other drug 2. The results 
for each pair are determined, and the testing stops when the cumulative number of 
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cures from one of the drugs exceeds the cumulative number of cures from the other 
by some fixed, predetermined number. More formally, let 


a 1 if the patient in the jth pair that receives drug 1 is cured 
!“) 0. otherwise 

Y= 1 if the patient in the jth pair that receives drug 2 is cured 
!~~ ) 0. otherwise 


For a predetermined positive integer M, the test stops after pair N, where N is 
the first value of n such that either 


Ay ee Ay Se (YY es aM 


or 
Ay et Ap = (yore FS 


In the former case, we assert that pj > pz and in the latter that pz > pj. 

In order to help ascertain whether the foregoing is a good test, one thing we 
would like to know is the probability that it leads to an incorrect decision. That 
is, for given py and pz, where py > p2, what is the probability that the test will 
incorrectly assert that p2 > p;? To determine this probability, note that after each 
pair is checked, the cumulative difference of cures using drug 1 versus drug 2 will go 
up by 1 with probability p;(1 — p2)—since this is the probability that drug 1 leads to 
acure and drug 2 does not—or go down by 1 with probability (1 — p,)p2, or remain 
the same with probability pyp2 + (1 — p1)(1 — pz). Hence, if we consider only those 
pairs in which the cumulative difference changes, then the difference will go up by 1 
with probability 


p = P{up 1|up 1 or down 1} 


- pil — p2) 
Pid — p2) + (1 — pi)p2 


and down by 1 with probability 


i<9e p20 — pi) 
Pid. = p2) +. = pips 
Thus, the probability that the test will assert that pz > py, is equal to the prob- 
ability that a gambler who wins each (one-unit) bet with probability p will go down 
M before going up M. But Equation (4.5), with i = M,N = 2M, shows that this 
probability is given by 


p = P{test asserts that p2 > py} 


1 —_ M 
om, 
1 P 
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where 


= P _ Pil = p2) 
1—p p20 — pr) 


For instance, if pj = .6 and p2 = .4, then the probability of an incorrect decision is 
.017 when M = 5 and reduces to .0003 when M = 10. a 


A total of 64 teams are selected to play in the end of season NCAA college basket- 
ball tournament. These 64 are divided into four groups, called brackets, of size 16 
each, with the teams in each bracket being given seedings ranging from 1 (the top 
rated team in the bracket) to 16 (the lowest rated team in the bracket). The teams in 
each bracket play each other in a knockout style tournament, meaning a loss knocks 
a team out of the tournament. Naming a team by its seeding, the schedule of games 
to be played by the teams in a bracket is as given by the following graph: 


Figure 3.3 NCAA Tournament Bracket Format 
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Thus, for instance, teams 1 and 16 play a game in round one, as do teams 8 
and 9, with the winners of these games then playing each other in round two. Let 
r(i,j) = r(j,i), i # j denote the round in which 7 and j would play if both teams win 
up to that point. That is, (i,j) = kif i andj would play in round k if each won its first 
k — 1 games. For instance, r(1, 16) = 1, 7(1,8) = 2, 71,5) = 3, r(1,6) = 4. 

Let us focus on a single one of the brackets, and let us suppose that, no mat- 
ter what has previously occurred, if i and j ever play each other then i will win with 
probability pj; = 1 — pj;j. Let P; be the probability that team i is the winner of the 
bracket, i= 1,...,16. Because P; is the probability that i wins 4 games, we will com- 
pute the values P;,..., P16 by determining the quantities P;(k), i = 1,...,16, where 
P;(k) is defined to be the probability that i wins its first k games. The probabilities 
Pi(k) will be determined recursively, first for k = 1, then for k = 2, then for k = 3, 
and finally for k = 4 which will yield P; = P;(4). 

Let O,(k) = {7 : (Gj) = k} be the set of possible opponents of i in round k. To 
compute P;(k), we will condition on which of the teams in O;(k) reaches round k. 
Because a team will reach round k if that team wins its first k — 1 games, this gives 


Pi(k) = > P(i wins its first k games|j reaches round k) Pj(k — 1) (4.6) 


je Oi(k) 
Now, because any team that plays a team in O;(k) in any of rounds 1,...,k — lisa 
possible opponent of team 7in round k, it follows that all games in rounds 1,...,k — 1 


involving a team in O;(k) will be against another team in that set, and thus the results 
of these games do not affect which teams i would play in its first k — 1 games. 
Consequently, whether team i reaches round k is independent of which team in O;(k) 
reaches round k. Hence, for j € O;(k) 


P(i wins its first k games|j reaches round k) 
= P(i wins its first k — 1 games, i beats j |j reaches round k) 
= P(i wins its first k — 1 games) P(i beats j|i and j reach round k) 
= Pik — 1) pij 


where the next to last equality follows because whether i wins its first k — 1 games 
is independent of the event that j wins its first k — 1 games. Hence, from (4.6) and 
the preceding equation, we have that 


Pik) = D> Pik — 1) pig Pik - 1) 
Je Oilk) 
=Pi(k-1) >> Pik - lpi (4.7) 
jeOi(k) 


Starting with P;(0) = 1, the preceding enables us to determine P;(1) for all i, which 
then enables us to determine P;(2) for all i, and so on, up to P; = P;(4). 

To indicate how to apply the recursive equations (4.7), suppose that p;j = ap 
Thus, for instance, the probability that team 2 (the second seed) beats team 7 (the 
seventh seed) is p27 = 7/9. To compute, P; = P;(4), the probability that i wins the 
bracket, start with the quantities P;(1), i = 1,...,16, equal to the probabilities that 
each team wins its first game. 
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P\(1) = pits = 16/17 =1 — Pi6(1) 
P21) = pass = 15/17 = 1 — Pis(1) 
P3(1) = p34 = 14/17 = 1 — Py4(1) 
P41) = paz = 13/17 = 1 — Pi3(1) 
Ps(1) = ps2 = 12/17 = 1 — P2(1) 
P6Q) = Post = W/17 = 1 — Py) 
P71) = p7i9 = 10/17 = 1 — Pio) 
Pg) =ps9 = 9/17 =1 — Po(1) 


The quantities P;(2) are then obtained by using the preceding along with the 
recursion (4.7). For instance, because the set of possible opponents of team 1 in 
round 2 is O;(2) = {8,9}, we have that 


16/98 8 9 
P\(2) = Pi()) (Ps)pig + Po()pis) = 77 (= + 7 a) = 8415 
The other quantities P;(2),..., P16(2) are obtained similarly, and are used to obtain 
the quantities P;(3), i = 1,...,16, which are then used to obtain P; = P;(4), 
PS 10.316: | 


Suppose that we are presented with a set of elements and we want to deter- 
mine whether at least one member of the set has a certain property. We can attack 
this question probabilistically by randomly choosing an element of the set in such a 
way that each element has a positive probability of being selected. Then the original 
question can be answered by a consideration of the probability that the randomly 
selected element does not have the property of interest. If this probability is equal 
to 1, then none of the elements of the set has the property; if it is less than 1, then at 
least one element of the set has the property. 

The final example of this section illustrates this technique. 


The complete graph having n vertices is defined to be a set of n points (called ver- 
n 
2 
The complete graph having 3 vertices is shown in Figure 3.4. Suppose now that each 
edge in a complete graph having n vertices is to be colored either red or blue. For a 
fixed integer k, a question of interest is, Is there a way of coloring the edges so that 


tices) in the plane and the lines (called edges) connecting each pair of vertices. 


2 
shown by a probabilistic argument that if n is not too large, then the answer is yes. 


: : k F 
no set of k vertices has all of its ( connecting edges the same color? It can be 


Figure 3.4 


The argument runs as follows: Suppose that each edge is, independently, equally 
likely to be colored either red or blue. That is, each edge is red with probability 5 
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Number the ‘) sets of k vertices and define the events £;,i = 1,..., ( a as 


follows: 


E; = {all of the connecting edges of the ith set 
of k vertices are the same color} 


Now, since each of the connecting edges of a set of k vertices is equally likely 


2 
to be either red or blue, it follows that the probability that they are all the same 


color is 
1\ KR-D/2 
P(E;) =2 (5) 


Therefore, because 


P Jz < ) > P(E) (Boole’s inequality) 
i F 


we find that P {| _) E; }, the probability that there is a set of k vertices all of whose 
i 


connecting edges are similarly colored, satisfies 


Z 1\k&-D/2-1 
PIUEi =(r)(3) 
i 
Hence, if 7 1\k&-D/2-1 i 
kK} \2 ° 


or, equivalently, if 
(;) —< Qk(k-1)/2-1 


then the probability that at least one of the : sets of k vertices has all of its 


connecting edges the same color is less than 1. Consequently, under the preceding 
condition on n and k, it follows that there is a positive probability that no set of k 
vertices has all of its connecting edges the same color. But this conclusion implies 
that there is at least one way of coloring the edges for which no set of k vertices has 
all of its connecting edges the same color. | 


Remarks (a) Whereas the preceding argument established a condition on n and k 
that guarantees the existence of a coloring scheme satisfying the desired property, it 
gives no information about how to obtain such a scheme (although one possibility 
would be simply to choose the colors at random, check to see if the resulting coloring 
satisfies the property, and repeat the procedure until it does). 

(b) The method of introducing probability into a problem whose statement is 
purely deterministic has been called the probabilistic method. Other examples of 
this method are given in Theoretical Exercise 24 and Examples 2t and 2u of Chapter 7 


TSee N. Alon, J. Spencer, and P. Erdos, The Probabilistic Method (New York: John Wiley & Sons, Inc., 1992). 
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3.5 P(-|F) Is a Probability 


Conditional probabilities satisfy all of the properties of ordinary probabilities, as is 
proved by Proposition 5.1, which shows that P(E|F) satisfies the three axioms of a 


probability. 
Proposition (a) 0 = P(E|F) = 1. 
5.1 (b) P(S|F) =1. 
(c) If E;,i=1,2,..., are mutually exclusive events, then 


P Vain) = > PEEP) 
i=1 


i=] 


Proof To prove part (a), we must show that 0 = P(EF)/P(F) = 1. The left-side 
inequality is obvious, whereas the right side follows because EF C F, which implies 
that P(EF) =< P(F). Part (b) follows because 


P(SF) _ P(F) | 
P(F) P(F) 


P Vain) = as 


(oe) 
P(UEF 7 . 
ee eee Seer U | F= U Ej\F 


P(S|F) = 


Part (c) follows from 


P(F) 


5 P(E:F) 
4 


~ ~P(F) 
>) PEF) 
1 


where the next-to-last equality follows because E;E; = © implies that E;FE;F = 
©. 


If we define Q(E) = P(E|F), then, from Proposition 5.1, Q(E) may be regarded 
as a probability function on the events of S. Hence, all of the propositions previously 
proved for probabilities apply to Q(£). For instance, we have 


Q(E, U Ex) = Q(Ei) + Q(E2) — Q(E) 2) 
or, equivalently, 
P(E] U £o|F) = P(E\|F) + P(E2|F) — PCE) Ed|F) 


Also, if we define the conditional probability Q(£,|E2) by Q(£,|E2) = Q(E, E2)/Q(E2), 
then, from Equation (3.1), we have 
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Example 
5a 


Example 
5b 


O(E)) = O(E\|E2)O(E2) + Q(E\|E5)Q(E5) (3) 
Since 


Q(£) E) 

O(E,|E2) = O(E) 
_ P(E, Ey|F) 

~ P(E9|F) 
P(E, E2F) 

_ PH) 

P(E2F) 

P(F) 
= P(E, |E2F) 


Equation (5.1) is equivalent to 


P(E\|F) = P(E} |E2F)P(E2|F) + PCE\|E5F)P(E5|F) 


Consider Example 3a, which is concerned with an insurance company that believes 
that people can be divided into two distinct classes: those who are accident prone 
and those who are not. During any given year, an accident-prone person will have an 
accident with probability .4, whereas the corresponding figure for a person who is not 
prone to accidents is .2. What is the conditional probability that a new policyholder 
will have an accident in his or her second year of policy ownership, given that the 
policyholder has had an accident in the first year? 


Solution If we let A be the event that the policyholder is accident prone and we let 
Ai,i = 1,2, be the event that he or she has had an accident in the ith year, then the 
desired probability P(A2|A,) may be obtained by conditioning on whether or not 
the policyholder is accident prone, as follows: 


P(A2|A1) = P(A2|AA1)P(A|A1) + P(A2|ASA1)P(A‘|A)) 


Now, 
P(A\A) _ P(A,|A)P(A) 


P(A|A,) = = 
AD) = BAD P(Aj) 


However, P(A) is assumed to equal > 


P(A,) = .26. Hence, 


and it was shown in Example 3a that 


(4)(3) _ 6 


EAI = B 


Thus, 
7 
P(A‘|Aj) =1 — P(A[A)) = B 


Since P(A2|AA1) = P(A2|A) = .4 and P(A2|ASA) = P(A2|A‘) = .2, it follows that 


6 7 
P(A2|Aq) = (4) 2)— & 2 a 
(Alay) = (4) + COa5 9 


A female chimp has given birth. It is not certain, however, which of two male chimps 
is the father. Before any genetic analysis has been performed, it is believed that 
the probability that male number 1 is the father is p and the probability that male 


Example 
5c 
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number 2 is the father is 1 — p. DNA obtained from the mother, male number 1, 
and male number 2 indicates that on one specific location of the genome, the mother 
has the gene pair (A, A), male number 1 has the gene pair (a, a), and male number 2 
has the gene pair (A, a). If a DNA test shows that the baby chimp has the gene pair 
(A, a), what is the probability that male number 1 is the father? 


Solution Let all probabilities be conditional on the event that the mother has the 
gene pair (A, A), male number 1 has the gene pair (a,a), and male number 2 has 
the gene pair (A,a). Now, let M; be the event that male number i, i = 1,2, is the 
father, and let By q be the event that the baby chimp has the gene pair (A, a). Then, 
P(M,|B 4a) is obtained as follows: 


P(M, Baa) 
P(Baa) 
7 P(Baal\M1)P(M1) 
~ P(BaalM1)P(My) + P(Baal|M2)P(M2) 


P(M,|Ba a) = 


- ae 
~1-p+ 0/20 — p) 
_ 2p 

~14+p 


Because a > pwhen p < 1, the information that the baby’s gene pair is (A, a) 
increases the probability that male number 1 is the father. This result is intuitive 
because it is more likely that the baby would have gene pair (A, a) if M, is true than 
if M> is true (the respective conditional probabilities being 1 and 1/2). i 


The next example deals with a problem in the theory of runs. 


Independent trials, each resulting in a success with probability p or a failure with 
probability g = 1 — p, are performed. We are interested in computing the probability 
that a run of n consecutive successes occurs before a run of m consecutive failures. 


Solution Let E be the event that a run of n consecutive successes occurs before a run 
of m consecutive failures. To obtain P(E), we start by conditioning on the outcome 
of the first trial. That is, letting H denote the event that the first trial results in a 
success, we obtain 


P(E) = pP(E|H) + gP(E|H") (5.2) 


Now, given that the first trial was successful, one way we can get a run of n 
successes before a run of m failures would be to have the next n — 1 trials all result 
in successes. So, let us condition on whether or not that occurs. That is, letting F be 
the event that trials 2 through n all are successes, we obtain 


P(E|H) = P(E|FH)P(F|H) + P(E|F°H)P(F*|H) (5.3) 


On the one hand, clearly, P(E|FH) = 1; on the other hand, if the event F°H occurs, 
then the first trial would result in a success, but there would be a failure some time 
during the next n — 1 trials. However, when this failure occurs, it would wipe out all 
of the previous successes, and the situation would be exactly as if we started out with 
a failure. Hence, 


P(E|F°H) = P(E\H*) 
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Because the independence of trials implies that F and H are independent, and because 
P(F) = p"—|, it follows from Equation (5.3) that 


P(E|H) = pp"! + 1 — p™ 1)P(EIH®) (5.4) 


We now obtain an expression for P(E|H‘°) in a similar manner. That is, we let G 
denote the event that trials 2 through m are all failures. Then, 


P(E|H®) = P(E|GH®)P(G|H®) + P(E|GSH")P(G‘|H°) (5.5) 


Now, GH‘ is the event that the first m trials all result in failures, so P(E|GH*‘) = 0. 
Also, if G°H® occurs, then the first trial is a failure, but there is at least one success 
in the next m — 1 trials. Hence, since this success wipes out all previous failures, we 
see that 

P(E|G°H*) = P(E|H) 


Thus, because P(G‘|H*) = P(G°) = 1 — q”~!, we obtain, from (5.5), 
P(E|H*) = (1 — q~!)P(E|H) (5.6) 


Solving Equations (5.4) and (5.6) yields 


n—1 
P(E|H) = pet + ra — pr-lgm-i 
and 
P(E|H®) = — =P" 
pt) 4 gmt — pr-ignl 
Thus, 


P(E) = pP(E|H) + qP(E|H®) 
pe 4: gp" (1 _ ge) 
~ prt Hi qn} _ pr-lgm-1 
pr id — qi) 
~ iI m—-1 _ yn—-l1yqm-1 (5.7) 
P + 4 pd 


It is interesting to note that by the symmetry of the problem, the probability 
of obtaining a run of m failures before a run of 1 successes would be given by 
Equation (5.7) with p and q interchanged and n and m interchanged. Hence, this 
probability would equal 


P{run of m failures before a run of n successes} 
q™ td — p") 


= grt ae pr-l — qm—lpn-l (5.8) 


Since Equations (5.7) and (5.8) sum to 1, it follows that, with probability 1, either a 
run of n successes or a run of m failures will eventually occur. 

As an example of Equation (5.7), we note that, in tossing a fair coin, the proba- 
bility that a run of 2 heads will precede a run of 3 tails is i. For 2 consecutive heads 


before 4 consecutive tails, the probability rises to 3. O 


In our next example, we return to the matching problem and obtain a solution 
by using conditional probabilities. 
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At a party, n people take off their hats. The hats are then mixed up, and each person 
randomly selects one. We say that a match occurs if a person selects his or her own 
hat. What is the probability of 


(a) no matches? 


(b) exactly k matches? = 


Solution (a) Let E denote the event that no matches occur, and to make explicit 
the dependence on n, write P, = P(E). We start by conditioning on whether or not 
the first person selects his or her own hat—call these events M and M‘, respectively. 
Then, 


P, = P(E) = P(E|M)P(M) + P(E|M)P(M‘) 
Clearly, P(E|M) = 0, so 
—_ PCB)" —— (5.9) 


Now, P(E|M‘*) is the probability of no matches when n — 1 people select from a set 
of n — 1 hats, when one person, called the “extra” person, does not have their hat 
in the collection, and one hat, called the “extra” hat, does not belong to any of the 
people. This can happen in either of two mutually exclusive ways: Either there are 
no matches and the extra person does not select the extra hat (this being the hat of 
the person who chose first) or there are no matches and the extra person does select 
the extra hat. The probability of the first of these events is just P,_1, which is seen 
by regarding the extra hat as “belonging” to the extra person. Because the second 
event has probability [1/(7 — 1)]Pn_2, we have 


1 
P(E|M*) = Pha + me a 


Thus, from Equation (5.9), 


or, equivalently, 


1 
Ph — Phi= FT Pn-1 = Pye 9) (5.10) 


However, since P, is the probability of no matches when 7 people select among their 
own hats, we have 


1 
P,;=0 Po = 5 
So, from Equation (5.10), 
(P2 — Pj) 1 1 1 
P3 P2= 3 = 31 or P3= TI - 3 
(Py = Py) _ 1 1 1 it 
P4 P= 4 =F or iS or 
and, in general, 
1 1 1 —1)" 
ee - Cd 
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Example 
Se 


(b) To obtain the probability of exactly k matches, we consider any fixed group 
of k people. The probability that they, and only they, select their own hats is 


1 1 1 (n — k)! 


eaes PP. .= P,. 
nn— 1 fa i 


n! 


where P,_x is the conditional probability that the other n — k people, selecting 


: F n : 
among their own hats, have no matches. Since there are choices of a set of k 


k 
people, the desired probability of exactly k matches is 
tty oe 
Pg a ae - 
kl k! 


An important concept in probability theory is that of the conditional indepen- 
dence of events. We say that the events E; and £2 are conditionally independent 
given F if given that F occurs, the conditional probability that E; occurs is unchanged 
by information as to whether or not E2 occurs. More formally, E; and E> are said to 
be conditionally independent given F if 


P(E\|E2F) = P(E\|F) (5.11) 


or, equivalently, 
P(E Ep|F) = P(E] |F)P(E2|F) (5.12) 


The notion of conditional independence can easily be extended to more than 
two events, and this extension is left as an exercise. 

The reader should note that the concept of conditional independence was implic- 
itly employed in Example 5a, where it was assumed that the events that a poli- 
cyholder had an accident in his or her ith year, i = 1,2,..., were conditionally 
independent given whether or not the person was accident prone. The following 
example, sometimes referred to as Laplace’s rule of succession, further illustrates 
the concept of conditional independence. 


Laplace’s rule of succession 


There are k + 1 coins in a box. When flipped, the ith coin will turn up heads with 
probability i/k,i = 0,1,...,. A coin is randomly selected from the box and is then 
repeatedly flipped. If the first 7 flips all result in heads, what is the conditional prob- 
ability that the (7 + 1) flip will do likewise? 


Solution Letting H,, denote the event that the first n flips all land heads, the desired 
probability is 
P(An+1 Hn) = P(An41) 

P(Hn) P(An) 


P(An41|Hn) = 


To compute P(H,,), we condition on which coin is chosen. That is, letting C; denote 
the event that coin 7 is selected, we have that 


k 
P(Hn) = > P(HalC)P(C) 
i=0 
Now, given that coin i is selected, it is reasonable to assume that the outcomes will 


be conditionally independent, with each one resulting in a head with probability i/k. 
Hence, 
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P(Hn|Ci) = (i/k)" 


As P(C;) = 47, this yields that 


1 . a n 
P(A) = k+1 Le 
Thus, 
yA (i/ky"t! 
Deo li/k)” 


If k is large, we can use the integral approximations 


P(An+1\An) = 


So, for k large, 
P(An41|Hn) © 


= 
+ 
N 


Updating information sequentially 


Suppose there are n mutually exclusive and exhaustive possible hypotheses, with 
initial (sometimes referred to as prior) probabilities P(Hj), )~/_, P(H;) = 1. Now, if 
information that the event F has occurred is received, then the conditional probabil- 
ity that Hj is the true hypothesis (sometimes referred to as the updated or posterior 
probability of Hj) is 

P(E| Hi) Pi) 
>); PEA) PA) 


P(A,|E) = (5.13) 
Suppose now that we learn first that E; has occurred and then that £2 has occurred. 
Then, given only the first piece of information, the conditional probability that H; is 
the true hypothesis is 

P(E) |Hi) Pi) P(E\|Hi)P(Hi) 


whereas given both pieces of information, the conditional probability that H; is the 
true hypothesis is P(Hj|E, E2), which can be computed by 


P(E) E>| Hj) P(A) 


PORE) = BG Fal) PAD 


One might wonder, however, when one can compute P(H;|E) £2) by using the right 
side of Equation (5.13) with E = E> and with P(Hj;) replaced by P(Hj|F}), 
j = 1,...,n. That is, when is it legitimate to regard P(Hj|F)), 7 = 1, as the prior 
probabilities and then use (5.13) to compute the posterior probabilities? 


Solution The answer is that the preceding is legitimate, provided that for each 
j = 1,...,n, the events EF; and F) are conditionally independent, given Hj. For if 
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this is the case, then 
P(E) £2|Hj) = P(E2|A)P(E1\Aj), j=l,...,n 


Therefore, 
P(E2|Hi) P(E) |Hi) PA) 
P(E, E72) 
_ P(E2|Hj) PE, Hi) 
7 P(E, E>) 
_ P(E2| Hi) P(Ai|E1) P(E1) 
7 P(E) E>) 
_ P(E2|Hi)PCHi\E1) 
7 Q(1,2) 


where Q(1,2) = See Since the preceding equation is valid for all i, we obtain, 
upon summing, 


P(Hj\|E\ £2) = 


E " P(E>|H;)P(Hj|E 
1= >> PGE) = >- ( ane . /E1) 
i=1 i=1 > 


showing that 


n 


Q(1,2) = >> P(Ey|Hi) P(A E1) 
i=1 


and yielding the result 


P(Eo|Hj)P(Ai|E1) 


PUREED) = Sr py) PED 


For instance, suppose that one of two coins is chosen to be flipped. Let Hj be the 
event that coin i, i = 1,2, is chosen, and suppose that when coin / is flipped, it lands 
on heads with probability p;, i = 1,2. Then, the preceding equations show that to 
sequentially update the probability that coin 1 is the one being flipped, given the 
results of the previous flips, all that must be saved after each new flip is the condi- 
tional probability that coin 1 is the coin being used. That is, it is not necessary to 
keep track of all earlier results. a 


Summary 


For events E and F, the conditional probability of E given A valuable identity is 
that F has occurred is denoted by P(E|F) and is defined by 


P(E) = P(E|F)P(F) + P(E|F)P(F) 


P(E\F) = PEP 
P(F) which can be used to compute P(E) by “conditioning” on 
; ; whether F occurs. 
The identity P(H)/P(H‘) is called the odds of the event H. The 


identity 
P(E Ey - ++ En) = PE})P(E2|E1) +++ P(En|E1 +++ En-1) 


PUHH\E) — PH) P(E|H) 
is known as the multiplication rule of probability. P(H‘\E)  P(H®)P(E\H®) 


shows that when new evidence E£ is obtained, the value of 
the odds of H becomes its old value multiplied by the ratio 
of the conditional probability of the new evidence when 1 
is true to the conditional probability when #/ is not true. 
Let Fj, i = 1,...,n, be mutually exclusive events 
whose union is the entire sample space. The identity 


P(E|F;)P(F;) 
P(RIE) = = 
>) PUELFDP(R) 
i=1 
is known as Bayes’s formula. If the events F;, i = 1,...,n, 


are competing hypotheses, then Bayes’s formula shows 
how to compute the conditional probabilities of these 
hypotheses when additional evidence E becomes avail- 
able. 


Problems 


3.1. Mary thinks of a number from 1 to 9 and John tries to 
guess it. Set up a sample space and compute the probabil- 
ity that John’s guess is correct. 


3.2. Lobsters are caught in morning and afternoon ses- 
sions. The number of lobsters that can be caught in each 
session are 0, 1, 2, 3, 4, or 5, each with probability 1/6. 
Find the probability of at least one lobster being caught 
in the afternoon given that more lobsters were caught in 
the morning. 


3.3. Use Equation (2.1) to compute in a hand of bridge 
the conditional probability that East has 3 spades given 
that North and South have a combined total of 8 
spades. 


3.4. What is the probability that at least one of a pair of 
fair dice lands on 6, given that the sum of the dice is i, 
b= 2.3 5002312? 


3.5. An urn contains 6 white and 9 black balls. If 4 balls are 
to be randomly selected without replacement, what is the 
probability that the first 2 selected are white and the last 2 
black? 


3.6. Consider an urn containing 12 balls, of which 8 are 
white. A sample of size 4 is to be drawn with replacement 
(without replacement). What is the conditional probabil- 
ity (in each case) that the first and third balls drawn will be 
white given that the sample drawn contains exactly 3 white 
balls? 


3.7. The king comes from a family of 2 children. What is 
the probability that the other child is his sister? 


A First Course in Probability 115 


The denominator of Bayes’s formula uses that 


P(E) = > PUE\F) PCF) 
i=1 


which is called the law of total probability. 

If P(EF) = P(E)P(P), then we say that the events 
E and F are independent. This condition is equivalent to 
P(E|F) = P(E) and to P(F|E) = P(F). Thus, the events E 
and F are independent if knowledge of the occurrence of 
one of them does not affect the probability of the other. 

The events Fj,...,E, are said to be independent if, 
for any subset Fj,,..., Ej, of them, 


P(E; +++ Ei,) = P(Ei,) ++: P(Ei,) 


For a fixed event F, P(E|F) can be considered to be a prob- 
ability function on the events E of the sample space. 


3.8. A couple has 2 children. What is the probability that 
both are girls if the older of the two is a girl? 


3.9. Consider 3 urns. Urn A contains 2 white and 4 red 
balls, urn B contains 8 white and 4 red balls, and urn C con- 
tains 1 white and 3 red balls. If 1 ball is selected from each 
urn, what is the probability that the ball chosen from urn 
A was white given that exactly 2 white balls were selected? 


3.10. Three cards are randomly selected, without replace- 
ment, from an ordinary deck of 52 playing cards. Compute 
the conditional probability that the first card selected is a 
spade given that the second and third cards are spades. 


3.11. Two cards are randomly chosen without replacement 
from an ordinary deck of 52 cards. Let B be the event that 
both cards are aces, let A; be the event that the ace of 
spades is chosen, and let A be the event that at least one 
ace is chosen. Find 


(a) P(B|As) 
(b) P(BIA) 


3.12. Suppose distinct values are written on each of 3 
cards, which are then randomly given the designations A, 
B, and C. Given that card A’s value is less than card B’s 
value, find the probability it is also less than card C’s value. 


3.13. A recent college graduate is planning to take the first 
three actuarial examinations in the coming summer. She 
will take the first actuarial exam in June. If she passes that 
exam, then she will take the second exam in July, and if 
she also passes that one, then she will take the third exam 
in September. If she fails an exam, then she is not allowed 
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to take any others. The probability that she passes the first 
exam is .9. If she passes the first exam, then the conditional 
probability that she passes the second one is .8, and if she 
passes both the first and the second exams, then the condi- 
tional probability that she passes the third exam is .7 


(a) What is the probability that she passes all three exams? 


(b) Given that she did not pass all three exams, what is the 
conditional probability that she failed the second exam? 


3.14. Suppose that an ordinary deck of 52 cards (which 
contains 4 aces) is randomly divided into 4 hands of 13 
cards each. We are interested in determining p, the prob- 
ability that each hand has an ace. Let FE; be the event 
that the ith hand has exactly one ace. Determine p = 
P(E, E7E3E4) by using the multiplication rule. 


3.15. An urn initially contains 5 white and 7 black balls. 
Each time a ball is selected, its color is noted and it is 
replaced in the urn along with 2 other balls of the same 
color. Compute the probability that 


(a) the first 2 balls selected are black and the next 2 are 
white; 
(b) of the first 4 balls selected, exactly 2 are black. 


3.16. An ectopic pregnancy is twice as likely to develop 
when the pregnant woman is a smoker as it is when she is a 
nonsmoker. If 32 percent of women of childbearing age are 
smokers, what percentage of women having ectopic preg- 
nancies are smokers? 


3.17. Ninety-eight percent of all babies survive delivery. 
However, 15 percent of all births involve Cesarean (C) 
sections, and when a C section is performed, the baby sur- 
vives 96 percent of the time. If a randomly chosen pregnant 
woman does not have a C section, what is the probability 
that her baby survives? 


3.18. In a certain community, 36 percent of the families 
own a dog and 22 percent of the families that own a dog 
also own a cat. In addition, 30 percent of the families own 
a cat. What is 


(a) the probability that a randomly selected family owns 
both a dog and a cat? 

(b) the conditional probability that a randomly selected 
family owns a dog given that it owns a cat? 


3.19. A total of 46 percent of the voters in a certain city 
classify themselves as Independents, whereas 30 percent 
classify themselves as Liberals and 24 percent say that they 
are Conservatives. In a recent local election, 35 percent 
of the Independents, 62 percent of the Liberals, and 58 
percent of the Conservatives voted. A voter is chosen at 
random. Given that this person voted in the local election, 
what is the probability that he or she is 


(a) an Independent? 


(b) a Liberal? 
(c) a Conservative? 


(d) What percent of voters participated in the local 
election? 


3.20. A total of 48 percent of the women and 37 per- 
cent of the men who took a certain “quit smoking” class 
remained nonsmokers for at least one year after complet- 
ing the class. These people then attended a success party 
at the end of a year. If 62 percent of the original class was 
male, 


(a) what percentage of those attending the party were 
women? 

(b) what percentage of the original class attended the 
party? 


3.21. Fifty-two percent of the students at a certain college 
are females. Five percent of the students in this college 
are majoring in computer science. Two percent of the stu- 
dents are women majoring in computer science. If a stu- 
dent is selected at random, find the conditional probability 
that 


(a) the student is female given that the student is majoring 
in computer science; 
(b) this student is majoring in computer science given that 
the student is female. 


3.22. A total of 500 married working couples were polled 
about their annual salaries, with the following information 
resulting: 


Husband 
Wife Less than More than 
$125,000 $125,000 
Less than $125,000 212 198 


More than $125,000 36 54 


For instance, in 36 of the couples, the wife earned more 
and the husband earned less than $125,000. If one of the 
couples is randomly chosen, what is 


(a) the probability that the husband earns less than 
$125,000? 

(b) the conditional probability that the wife earns more 
than $125,000 given that the husband earns more than this 
amount? 

(c) the conditional probability that the wife earns more 
than $125,000 given that the husband earns less than this 
amount? 


3.23. A red die, a blue die, and a yellow die (all six sided) 
are rolled. We are interested in the probability that the 
number appearing on the blue die is less than that appear- 
ing on the yellow die, which is less than that appearing on 


the red die. That is, with B, Y, and R denoting, respec- 
tively, the number appearing on the blue, yellow, and red 
die, we are interested in P(B < Y < R). 


(a) What is the probability that no two of the dice land on 
the same number? 

(b) Given that no two of the dice land on the same num- 
ber, what is the conditional probability that B < Y < R? 


(c) What is P(B < Y < R)? 


3.24. Urn I contains 2 white and 4 red balls, whereas urn II 
contains 1 white and 1 red ball. A ball is randomly chosen 
from urn I and put into urn II, and a ball is then randomly 
selected from urn II. What is 


(a) the probability that the ball selected from urn II is 
white? 

(b) the conditional probability that the transferred ball 
was white given that a white ball is selected from urn II? 


3.25. Maqsuma goes to Charlie’s to buy groceries and 
to Macellu to buy meat. Charlie’s gives gift tokens to 
customers 10 percent of the times they visit the outlet, 
whereas Macellu offers gift tokens to customers 5 percent 
of the time. Both outlets offer tokens to Maqsuma. For 
every 5 visits to Charlie’s, she goes to Macellu once. Given 
that she is coming home with a token, what is the proba- 
bility that she is coming from Macellu? 


3.26. Each of 2 balls is painted either black or gold and 
then placed in an urn. Suppose that each ball is colored 
black with probability i and that these events are inde- 
pendent. 


(a) Suppose that you obtain information that the gold 
paint has been used (and thus at least one of the balls is 
painted gold). Compute the conditional probability that 
both balls are painted gold. 

(b) Suppose now that the urn tips over and 1 ball falls out. 
It is painted gold. What is the probability that both balls 
are gold in this case? Explain. 


3.27. The following method was proposed to estimate the 
number of people over the age of 50 who reside in a town 
of known population 100,000: “As you walk along the 
streets, keep a running count of the percentage of people 
you encounter who are over 50. Do this for a few days; 
then multiply the percentage you obtain by 100,000 to 
obtain the estimate.” Comment on this method. 


Hint: Let p denote the proportion of people in the town 
who are over 50. Furthermore, let a; denote the propor- 
tion of time that a person under the age of 50 spends in 
the streets, and let w2 be the corresponding value for those 
over 50. What quantity does the method suggested esti- 
mate? When is the estimate approximately equal to p? 


3.28. Suppose that 5 percent of men and 0.25 percent of 
women are color blind. A color-blind person is chosen 
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at random. What is the probability of this person being 
male? Assume that there are an equal number of males 
and females. What if the population consisted of twice as 
many males as females? 


3.29. Xiku Road has nj, 12, 13, and n4 houses with 1, 2, 
3, and 4 occupants, respectively. Two random selection 
without replacement strategies are being contemplated to 
obtain a sample of the residents. In the first strategy, res- 
idents are selected with equal probability. In the second 
strategy, houses are first randomly selected and then resi- 
dents from these houses are selected. 


Work out, in terms of nj, m2, 13, and ny, the condi- 
tional probability of a resident from a 3-occupant house 
being selected given that the first selection came from a 
4-occupant residence under both strategies. 


3.30. Suppose that an ordinary deck of 52 cards is shuffled 
and the cards are then turned over one at a time until the 
first ace appears. Given that the first ace is the 20th card 
to appear, what is the conditional probability that the card 
following it is the 


(a) ace of spades? 
(b) two of clubs? 


3.31. Twenty persons are attending a meeting in a hall, 
8 of whom left the hall and have returned at least once 
so far. Four persons have just left and are coming back. 
This coincides with the selection of a subcommittee of 4 
persons who will be chosen by everyone. What is the prob- 
ability that all subcommittee members will have never left 
the hall? 


3.32. Consider two boxes, one containing 1 black and 1 
white marble, the other 2 black and 1 white marble. A 
box is selected at random, and a marble is drawn from 
it at random. What is the probability that the marble is 
black? What is the probability that the first box was the 
one selected given that the marble is white? 


3.33. Ms. Aquina has just had a biopsy on a possibly can- 
cerous tumor. Not wanting to spoil a weekend family 
event, she does not want to hear any bad news in the next 
few days. But if she tells the doctor to call only if the news 
is good, then if the doctor does not call, Ms. Aquina can 
conclude that the news is bad. So, being a student of prob- 
ability, Ms. Aquina instructs the doctor to flip a coin. If it 
comes up heads, the doctor is to call if the news is good and 
not call if the news is bad. If the coin comes up tails, the 
doctor is not to call. In this way, even if the doctor doesn’t 
call, the news is not necessarily bad. Let a be the proba- 
bility that the tumor is cancerous; let 8 be the conditional 
probability that the tumor is cancerous given that the doc- 
tor does not call. 


(a) Which should be larger, a or 6? 
(b) Find in terms of a, and prove your answer in part (a). 
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3.34. Patients are randomly allocated in KGV hospital 
wards subject to the availability of beds. Wards 1, 2, 3, 4, 
and 5 can accommodate 25, 30, 65, 40, and 30 beds, respec- 
tively. Records reveal that the probabilities of a patient 
leaving a ward dead are given by .43, .49, .49, .54, and .44, 
respectively. 


(a) What is the probability that a patient who is admitted 
to KGV hospital dies? 

(b) What is the probability that a patient who has died was 
admitted in ward 3? 


3.35. Gloria or Dominic often misplaces a key that should 
be kept in the bedroom. The probability that Gloria leaves 
the key in the kitchen is .3, and the probability that 
Dominic forgets it in the conservatory is .45. Gloria uses 
the key twice as frequently as Dominic. 


(a) What is the probability that the key is in its proper 
place? 

(b) Given that the key is not in the bedroom, what is the 
probability that Gloria has misplaced it? 


3.36. In Example 3f, suppose that the new evidence is sub- 
ject to different possible interpretations and in fact shows 
only that it is 90 percent likely that the criminal pos- 
sesses the characteristic in question. In this case, how likely 
would it be that the suspect is guilty (assuming, as before, 
that he has the characteristic)? 


3.37. With probability .6, the present was hidden by mom; 
with probability .4, it was hidden by dad. When mom hides 
the present, she hides it upstairs 70 percent of the time and 
downstairs 30 percent of the time. Dad is equally likely to 
hide it upstairs or downstairs. 


(a) What is the probability that the present is upstairs? 


(b) Given that it is downstairs, what is the probability it 
was hidden by dad? 


3.38. Stores A, B, and C have 50, 75, and 100 employees, 
respectively, and 50, 60, and 70 percent of them respec- 
tively are women. Resignations are equally likely among 
all employees, regardless of sex. One woman employee 
resigns. What is the probability that she works in store C? 


3.39. Three finalists of a song festival comprise a female 
soloist, a male soloist, and a female duo. It is also known 
that the male soloist did not win. 


(a) What is the probability that the festival was won by the 
duo? 


(b) What is the probability that the duo finished third? 


3.40. Urn A has 5 white and 7 black balls. Urn B has 3 
white and 12 black balls. We flip a fair coin. If the outcome 
is heads, then a ball from urn A is selected, whereas if the 
outcome is tails, then a ball from urn B is selected. Sup- 
pose that a white ball is selected. What is the probability 
that the coin landed tails? 


3.41. In Example 3a, what is the probability that someone 
has an accident in the second year given that he or she had 
no accidents in the first year? 


3.42. Consider a sample of size 3 drawn in the following 
manner: We start with an urn containing 5 white and 7 red 
balls. At each stage, a ball is drawn and its color is noted. 
The ball is then returned to the urn, along with an addi- 
tional ball of the same color. Find the probability that the 
sample will contain exactly 


(a) 0 white balls; 
(b) 1 white ball; 
(c) 3 white balls; 
(d) 2 white balls. 


3.43. A deck of cards is shuffled and then divided into two 
halves of 26 cards each. A card is drawn from one of the 
halves; it turns out to be an ace. The ace is then placed in 
the second half-deck. The half is then shuffled, and a card 
is drawn from it. Compute the probability that this drawn 
card is an ace. 


Hint: Condition on whether or not the interchanged card 
is selected. 


3.44. Twelve percent of all U.S. households are in 
California. A total of 1.3 percent of all U.S. households 
earn more than $250,000 per year, while a total of 3.3 per- 
cent of all California households earn more than $250,000 
per year. 


(a) What proportion of all non-California households earn 
more than $250,000 per year? 


(b) Given that a randomly chosen U.S. household earns 
more than $250,000 per year, what is the probability it is a 
California household? 


3.45. There are 3 coins in a box. One is a two-headed coin, 
another is a fair coin, and the third is a biased coin that 
comes up heads 75 percent of the time. When one of the 
3 coins is selected at random and flipped, it shows heads. 
What is the probability that it was the two-headed coin? 


3.46. Three prisoners are informed by their jailer that one 
of them has been chosen at random to be executed and 
the other two are to be freed. Prisoner A asks the jailer to 
tell him privately which of his fellow prisoners will be set 
free, claiming that there would be no harm in divulging 
this information because he already knows that at least 
one of the two will go free. The jailer refuses to answer the 
question, pointing out that if A knew which of his fellow 
prisoners were to be set free, then his own probability of 
being executed would rise from 5 to 4 because he would 
then be one of two prisoners. What do you think of the 
jailer’s reasoning? 


3.47. There is a 30 percent chance that A can fix her busted 
computer. If A cannot, then there is a 40 percent chance 
that her friend B can fix it. 


(a) Find the probability it will be fixed by either A or B. 
(b) If it is fixed, what is the probability it will be fixed by B. 


3.48. In any given year, a male automobile policyholder 
will make a claim with probability p,, and a female pol- 
icyholder will make a claim with probability pr, where 
Pf * Pm. The fraction of the policyholders that are male 
isa,0 <a < 1.A policyholder is randomly chosen. If A; 
denotes the event that this policyholder will make a claim 
in year i, show that 


P(A2|A1) > P(A1) 


Give an intuitive explanation of why the preceding 
inequality is true. 


3.49. An urn contains 5 white and 10 black balls. A fair die 
is rolled and that number of balls is randomly chosen from 
the urn. What is the probability that all of the balls selected 
are white? What is the conditional probability that the die 
landed on 3 if all the balls selected are white? 


3.50. Each of 2 cabinets identical in appearance has 2 
drawers. Cabinet A contains a silver coin in each drawer, 
and cabinet B contains a silver coin in one of its draw- 
ers and a gold coin in the other. A cabinet is randomly 
selected, one of its drawers is opened, and a silver coin is 
found. What is the probability that there is a silver coin in 
the other drawer? 


3.51. Prostate cancer is the most common type of cancer 
found in males. As an indicator of whether a male has 
prostate cancer, doctors often perform a test that mea- 
sures the level of the prostate-specific antigen (PSA) that is 
produced only by the prostate gland. Although PSA levels 
are indicative of cancer, the test is notoriously unreliable. 
Indeed, the probability that a noncancerous man will have 
an elevated PSA level is approximately .135, increasing to 
approximately .268 if the man does have cancer. If, on the 
basis of other factors, a physician is 70 percent certain that 
a male has prostate cancer, what is the conditional proba- 
bility that he has the cancer given that 


(a) the test indicated an elevated PSA level? 
(b) the test did not indicate an elevated PSA level? 


Repeat the preceding calculation, this time assuming that 
the physician initially believes that there is a 30 percent 
chance that the man has prostate cancer. 


3.52. Suppose that an insurance company classifies people 
into one of three classes: good risks, average risks, and bad 
risks. The company’s records indicate that the probabilities 
that good-, average-, and bad-risk persons will be involved 
in an accident over a |-year span are, respectively, .05, .15, 
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and .30. If 20 percent of the population is a good risk, 50 
percent an average risk, and 30 percent a bad risk, what 
proportion of people have accidents in a fixed year? If 
policyholder A had no accidents in 2012, what is the prob- 
ability that he or she is a good risk? is an average risk? 


3.53. A worker has asked her supervisor for a letter of 
recommendation for a new job. She estimates that there 
is an 80 percent chance that she will get the job if she 
receives a strong recommendation, a 40 percent chance if 
she receives a moderately good recommendation, and a 
10 percent chance if she receives a weak recommendation. 
She further estimates that the probabilities that the rec- 
ommendation will be strong, moderate, and weak are .7, .2, 
and .1, respectively. 


(a) How certain is she that she will receive the new job 
offer? 

(b) Given that she does receive the offer, how likely should 
she feel that she received a strong recommendation? a 
moderate recommendation? a weak recommendation? 
(c) Given that she does not receive the job offer, how 
likely should she feel that she received a strong recommen- 
dation? a moderate recommendation? a weak recommen- 
dation? 


3.54. Players A, B, C, D are randomly lined up. The first 
two players in line then play a game; the winner of that 
game then plays a game with the person who is third in 
line; the winner of that game then plays a game with the 
person who is fourth in line. The winner of that last game 
is considered the winner of the tournament. If A wins each 
game it plays with probability p, determine the probability 
that A is the winner of the tournament. 


3.55. Players 1,2,3 are playing a tournament. Two of these 
three players are randomly chosen to play a game in round 
one, with the winner then playing the remaining player in 
round two. The winner of round two is the tournament vic- 
tor. Assume that all games are independent and that 7 wins 
when playing against j with probability iy 


(a) Find the probability that 1 is the tournament victor. 


(b) If 1 is the tournament victor, find the conditional prob- 
ability that 1 did not play in round one. 


3.56. Suppose there are two coins, with coin 1 landing 
heads when flipped with probability .3 and coin 2 with 
probability .5. Suppose also that we randomly select one 
of these coins and then continually flip it. Let H; denote 
the event that flip j,j = 1, lands heads. Also, let C; be the 
event that coin 7 was chosen, i = 1,2. 


(a) Find P(A). 

(b) Find P(H>|H}). 

(c) Find P(Cy | Hy). 

(d) Find P(H>H3H4|H}). 
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3.57. In a 7 game series played with two teams, the first 
team to win a total of 4 games is the winner. Suppose that 
each game played is independently won by team A with 
probability p. 


(a) Given that one team leads 3 to 0, what is the probabil- 
ity that it is team A that leads. 

(b) Given that one team leads 3 to 0, what is the probabil- 
ity that team wins the series. 


3.58. A parallel system functions whenever at least one 
of its components works. Consider a parallel system of 
n components, and suppose that each component works 
independently with probability i Find the conditional 
probability that component 1 works given that the system 
is functioning. 


3.59. If you had to construct a mathematical model for 
events FE and F, as described in parts (a) through (e), 
would you assume that they were independent events? 
Explain your reasoning. 


(a) E is the event that a businesswoman has blue eyes, and 
F is the event that her secretary has blue eyes. 

(b) EF is the event that a professor owns a car, and F is the 
event that he is listed in the telephone book. 

(c) E is the event that a man is under 6 feet tall, and F is 
the event that he weighs more than 200 pounds. 

(d) E is the event that a woman lives in the United States, 
and F is the event that she lives in the Western Hemi- 
sphere. 

(e) E is the event that it will rain tomorrow, and F is the 
event that it will rain the day after tomorrow. 


3.60. In a class, there are 4 first-year boys, 6 first-year girls, 
and 6 sophomore boys. How many sophomore girls must 
be present if sex and class are to be independent when a 
student is selected at random? 


3.61. Suppose that you continually collect coupons and 
that there are m different types. Suppose also that each 
time a new coupon is obtained, it is a type i coupon with 
probability p;,i = 1,...,m. Suppose that you have just col- 
lected your nth coupon. What is the probability that it is a 
new type? 


Hint: Condition on the type of this coupon. 


3.62. A simplified model for the movement of the price of 
a stock supposes that on each day the stock’s price either 
moves up | unit with probability p or moves down 1 unit 
with probability 1 — p. The changes on different days are 
assumed to be independent. 


(a) What is the probability that after 2 days the stock will 
be at its original price? 

(b) What is the probability that after 3 days the stock’s 
price will have increased by 1 unit? 


(c) Given that after 3 days the stock’s price has increased 
by 1 unit, what is the probability that it went up on the first 
day? 


3.63. Suppose that we want to generate the outcome of the 
flip of a fair coin, but that all we have at our disposal is a 
biased coin that lands on heads with some unknown proba- 
bility p that need not be equal to ik. Consider the following 
procedure for accomplishing our task: 


1. Flip the coin. 

2. Flip the coin again. 

3. If both flips land on heads or both land on tails, return 
to step 1. 

4. Let the result of the last flip be the result of the experi- 
ment. 


(a) Show that the result is equally likely to be either heads 
or tails. 

(b) Could we use a simpler procedure that continues to flip 
the coin until the last two flips are different and then lets 
the result be the outcome of the final flip? 


3.64. Independent flips of a coin that lands on heads with 
probability p are made. What is the probability that the 
first four outcomes are 


(a) H, H, H, H? 

(b) 7, H,H, H? 

(c) What is the probability that the pattern T, H, H, H 
occurs before the pattern H, H, H, H? 


Hint for part (c): How can the pattern H, H, H, H occur 
first? 


3.65. The color of a person’s eyes is determined by a single 
pair of genes. If they are both blue-eyed genes, then the 
person will have blue eyes; if they are both brown-eyed 
genes, then the person will have brown eyes; and if one 
of them is a blue-eyed gene and the other a brown-eyed 
gene, then the person will have brown eyes. (Because of 
the latter fact, we say that the brown-eyed gene is domi- 
nant over the blue-eyed one.) A newborn child indepen- 
dently receives one eye gene from each of its parents, and 
the gene it receives from a parent is equally likely to be 
either of the two eye genes of that parent. Suppose that 
Smith and both of his parents have brown eyes, but Smith’s 
sister has blue eyes. 


(a) What is the probability that Smith possesses a blue- 
eyed gene? 

(b) Suppose that Smith’s wife has blue eyes. What is the 
probability that their first child will have blue eyes? 

(c) If their first child has brown eyes, what is the probabil- 
ity that their next child will also have brown eyes? 


3.66. Genes relating to albinism are denoted by A and a. 
Only those people who receive the a gene from both par- 
ents will be albino. Persons having the gene pair A, a are 
normal in appearance and, because they can pass on the 
trait to their offspring, are called carriers. Suppose that a 
normal couple has two children, exactly one of whom is 
an albino. Suppose that the nonalbino child mates with a 
person who is known to be a carrier for albinism. 


(a) What is the probability that their first offspring is an 
albino? 

(b) What is the conditional probability that their second 
offspring is an albino given that their firstborn is not? 


3.67. Barbara and Dianne go target shooting. Suppose that 
each of Barbara’s shots hits a wooden duck target with 
probability p;, while each shot of Dianne’s hits it with 
probability pz. Suppose that they shoot simultaneously at 
the same target. If the wooden duck is knocked over (indi- 
cating that it was hit), what is the probability that 


(a) both shots hit the duck? 
(b) Barbara’s shot hit the duck? 


What independence assumptions have you made? 


3.68. A and B are involved in a duel. The rules of the duel 
are that they are to pick up their guns and shoot at each 
other simultaneously. If one or both are hit, then the duel 
is over. If both shots miss, then they repeat the process. 
Suppose that the results of the shots are independent and 
that each shot of A will hit B with probability p4, and each 
shot of B will hit A with probability pg. What is 


(a) the probability that A is not hit? 


(b) the probability that both duelists are hit? 


(c) the probability that the duel ends after the nth round 
of shots? 
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(d) the conditional probability that the duel ends after the 
nth round of shots given that A is not hit? 

(e) the conditional probability that the duel ends after the 
nth round of shots given that both duelists are hit? 


3.69. Assume, as in Example 3h, that 64 percent of twins 
are of the same sex. Given that a newborn set of twins is of 
the same sex, what is the conditional probability that the 
twins are identical? 


3.70. The probability of the closing of the ith relay in the 
circuits shown in Figure 3.5 is given by p;,i = 1, 2, 3, 4,5. 
If all relays function independently, what is the probability 
that a current flows between A and B for the respective 
circuits? 


Hint for (b): Condition on whether relay 3 closes. 


3.71. An engineering system consisting of n components 
is said to be a k-out-of-n system (k = n) if the system 
functions if and only if at least k of the n components func- 
tion. Suppose that all components function independently 
of one another. 


(a) If the ith component functions with probability P;,i = 
1, 2,3, 4, compute the probability that a 2-out-of-4 system 
functions. 

(b) Repeat part (a) for a 3-out-of-5 system. 

(c) Repeat for a k-out-of-n system when all the P; equal p 
(that is, Pj) = p,i=1,2,...,n). 


3.72. In Problem 3.70a, find the conditional probability 
that relays 1 and 2 are both closed given that a current 
flows from A to B. 


3.73. A certain organism possesses a pair of each of 5 dif- 
ferent genes (which we will designate by the first 5 letters 
of the English alphabet). Each gene appears in 2 forms 


(a) 


il 


—>—B 
2 


—— 


3 


(b) 


wi 


Figure 3.5 Circuits for Problem 3.70 
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(which we designate by lowercase and capital letters). The 
capital letter will be assumed to be the dominant gene, in 
the sense that if an organism possesses the gene pair xX, 
then it will outwardly have the appearance of the X gene. 
For instance, if X stands for brown eyes and x for blue 
eyes, then an individual having either gene pair XX or xX 
will have brown eyes, whereas one having gene pair xx will 
have blue eyes. The characteristic appearance of an organ- 
ism is called its phenotype, whereas its genetic constitution 
is called its genotype. (Thus, 2 organisms with respective 
genotypes aA, bB, cc, dD, ee and AA, BB, cc, DD, ee would 
have different genotypes but the same phenotype.) In a 
mating between 2 organisms, each one contributes, at ran- 
dom, one of its gene pairs of each type. The 5 contributions 
of an organism (one of each of the 5 types) are assumed 
to be independent and are also independent of the con- 
tributions of the organism’s mate. In a mating between 
organisms having genotypes aA, DB, cC, dD, eE and aa, 
bB, cc, Dd, ee what is the probability that the progeny will 
(i) phenotypically and (ii) genotypically resemble 


(a) the first parent? 
(b) the second parent? 
(c) either parent? 

(d) neither parent? 


3.74. There is a 50-50 chance that the queen carries the 
gene for hemophilia. If she is a carrier, then each prince 
has a 50-50 chance of having hemophilia. If the queen has 
had three princes without the disease, what is the proba- 
bility that the queen is a carrier? If there is a fourth prince, 
what is the probability that he will have hemophilia? 


3.75. A town council of 7 members contains a steering 
committee of size 3. New ideas for legislation go first to the 
steering committee and then on to the council as a whole 
if at least 2 of the 3 committee members approve the leg- 
islation. Once at the full council, the legislation requires a 
majority vote (of at least 4) to pass. Consider a new piece 
of legislation, and suppose that each town council member 
will approve it, independently, with probability p. What 
is the probability that a given steering committee mem- 
ber’s vote is decisive in the sense that if that person’s vote 
were reversed, then the final fate of the legislation would 
be reversed? What is the corresponding probability for a 
given council member not on the steering committee? 


3.76. Suppose that each child born to a couple is equally 
likely to be a boy or a girl, independently of the sex dis- 
tribution of the other children in the family. For a couple 
having 5 children, compute the probabilities of the follow- 
ing events: 

(a) All children are of the same sex. 

(b) The 3 eldest are boys and the others girls. 

(c) Exactly 3 are boys. 

(d) The 2 oldest are girls. 


(e) There is at least 1 girl. 


3.77. A and B alternate rolling a pair of dice, stopping 
either when A rolls the sum 9 or when B rolls the sum 
6. Assuming that A rolls first, find the probability that the 
final roll is made by A. 


3.78. In a certain village, it is traditional for the eldest son 
(or the older son in a two-son family) and his wife to be 
responsible for taking care of his parents as they age. In 
recent years, however, the women of this village, not want- 
ing that responsibility, have not looked favorably upon 
marrying an eldest son. 


(a) If every family in the village has two children, what 
proportion of all sons are older sons? 

(b) If every family in the village has three children, what 
proportion of all sons are eldest sons? 

Assume that each child is, independently, equally likely to 
be either a boy or a girl. 


3.79. Suppose that E and F are mutually exclusive events 
of an experiment. Show that if independent trials of this 
experiment are performed, then EF will occur before F with 
probability P(E)/[P(E) + P(F)]. 


3.80. Consider an unending sequence of independent tri- 
als, where each trial is equally likely to result in any of the 
outcomes 1, 2, or 3. Given that outcome 3 is the last of the 
three outcomes to occur, find the conditional probability 
that 


(a) the first trial results in outcome 1; 
(b) the first two trials both result in outcome 1. 


3.81. A and B play a series of games. Each game is inde- 
pendently won by A with probability p and by B with 
probability 1 — p. They stop when the total number of 
wins of one of the players is two greater than that of the 
other player. The player with the greater number of total 
wins is declared the winner of the series. 


(a) Find the probability that a total of 4 games are played. 
(b) Find the probability that A is the winner of the series. 


3.82. In successive rolls of a pair of fair dice, what is the 
probability of getting 2 sevens before 6 even numbers? 


3.83. In a certain contest, the players are of equal skill and 
the probability is 5 that a specified one of the two contes- 
tants will be the victor. In a group of 2” players, the players 
are paired off against each other at random. The 2”! win- 
ners are again paired off randomly, and so on, until a single 
winner remains. Consider two specified contestants, A and 
B, and define the events Aj,i = n, E by 


Aj: A plays in exactly i contests 


E: AandB never play each other 


(a) Find P(A;),i=1,..., 1. 


(b) Find P(E). 
(c) Let P, = P(E). Show that 


1 (en a ala 
Pn=on—q t+ la) Pot 


and use this formula to check the answer you obtained in 
part (b). 

Hint: Find P(E) by conditioning on which of the events 
Aj,i = 1,...,n occur. In simplifying your answer, use the 
algebraic identity 


a wi le nx"! 4 (n = 1)x" 
ix = 
(1 — x)? 


For another approach to solving this problem, note that 
there are a total of 2” — 1 games played. 

(d) Explain why 2” — 1 games are played. 

Number these games, and let B; denote the event that A 
and B play each other in game i,i=1,...,2” — 1. 

(e) What is P(B;)? 

(f) Use part (e) to find P(E). 


3.84. An investor owns shares in a stock whose present 
value is 25. She has decided that she must sell her stock if it 
goes either down to 10 or up to 40. If each change of price 
is either up 1 point with probability .55 or down 1 point 
with probability .45, and the successive changes are inde- 
pendent, what is the probability that the investor retires a 
winner? 


3.85. A and B flip coins. A starts and continues flipping 
until a tail occurs, at which point B starts flipping and con- 
tinues until there is a tail. Then A takes over, and so on. Let 
P, be the probability of the coin landing on heads when A 
flips and P2 when B flips. The winner of the game is the 
first one to get 


(a) 2 heads in a row; 

(b) a total of 2 heads; 

(c) 3 heads in a row; 

(d) a total of 3 heads. 

In each case, find the probability that A wins. 


3.86. Die A has 4 red and 2 white faces, whereas die B has 
2 red and 4 white faces. A fair coin is flipped once. If it 
lands on heads, the game continues with die A; if it lands 
on tails, then die B is to be used. 


(a) Show that the probability of red at any throw is 5 

(b) If the first two throws result in red, what is the proba- 
bility of red at the third throw? 

(c) If red turns up at the first two throws, what is the prob- 
ability that it is die A that is being used? 
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3.87. An urn contains 12 balls, of which 4 are white. Three 
players—A, B, and C—successively draw from the urn, A 
first, then B, then C, then A, and so on. The winner is the 
first one to draw a white ball. Find the probability of win- 
ning for each player if 

(a) each ball is replaced after it is drawn; 

(b) the balls that are withdrawn are not replaced. 


3.88. Repeat Problem 3.87 when each of the 3 players 
selects from his own urn. That is, suppose that there are 
3 different urns of 12 balls with 4 white balls in each urn. 


3.89. Let S = {1,2,...,n} and suppose that A and B are, 
independently, equally likely to be any of the 2” subsets 
(including the null set and S itself) of S. 


(a) Show that : 
P{A C By= (3) 


Hint: Let N(B) denote the number of elements in B. Use 


P{A C B}=)-P{A C BIN(B) = i}P{N(B) = i} 
i=0 


Show that P[AB = @} = (3)". 


3.90. Consider an eight team tournament with the format 
given in Figure 3.6. If the probability that team i beats 
team j if they play is aE find the probability that team 1 
wins the tournament. 

3.91. Consider Example 2a, but now suppose that when 
the key is in a certain pocket, there is a 10 percent chance 
that a search of that pocket will not find the key. Let R 
and L be, respectively, the events that the key is in the 
right-hand pocket of the jacket and that it is in the left- 
hand pocket. Also, let Sr be the event that a search of 
the right-hand jacket pocket will be successful in finding 
the key, and let Uz be the event that a search of the left- 
hand jacket pocket will be unsuccessful and, thus, not find 
the key. Find P(Sr|Uz), the conditional probability that a 
search of the right-hand pocket will find the key given that 
a search of the left-hand pocket did not, by 


(a) using the identity 
P(SRUL) 
P(UL) 
determining P(SpU;,) by conditioning on whether or not 
the key is in the right-hand pocket, and determining P(U_) 


by conditioning on whether or not the key is in the left- 
hand pocket; 


P(SRr|UL) = 


(b) using the identity 


P(Sr|UL) = P(Sr|RUL)P(R\UL) 
+ P(Sr|R°UL)P(R“|Uz) 
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3.92. In Example 5e, what is the conditional probability 
that the ith coin was selected given that the first n trials 
all result in heads? 


3.93. In Laplace’s rule of succession (Example 5e), are the 
outcomes of the successive flips independent? Explain. 


3.94. A person tried by a 3-judge panel is declared guilty if 
at least 2 judges cast votes of guilty. Suppose that when the 
defendant is in fact guilty, each judge will independently 
vote guilty with probability .7 whereas when the defen- 
dant is in fact innocent, this probability drops to .2. If 70 
percent of defendants are guilty, compute the conditional 
probability that judge number 3 votes guilty given that 


(a) judges 1 and 2 vote guilty; 
(b) judges 1 and 2 cast 1 guilty and 1 not guilty vote; 
(c) judges 1 and 2 both cast not guilty votes. 


Let £j,i = 1,2,3 denote the event that judge i casts a guilty 
vote. Are these events independent? Are they condition- 
ally independent? Explain. 


3.95. Each of n workers is independently qualified to do an 
incoming job with probability p. If none of them is quali- 
fied then the job is rejected; otherwise the job is assigned 
to a randomly chosen one of the qualified workers. Find 


Figure 3.6 


the probability that worker 1 is assigned to the first incom- 
ing job. Hint: Condition on whether or not at least one 
worker is qualified. 


3.96. Suppose in the preceding problem that n = 2 and 
that worker / is qualified with probability p;, i = 1,2. 


(c) Find the probability that worker 1 is assigned to the 
first incoming job. 

(b) Given that worker 1 is assigned to the first job, find 
the conditional probability that worker 2 was quali- 
fied for that job. 


3.97. Each member of a population of size n is, indepen- 
dently of other members, female with probability p or 
male with probability 1 — p. Two individuals of the same 
sex will, independently of other pairs, be friends with prob- 
ability w; whereas two individuals of opposite sex will be 
friends with probability 6. Let Az, be the event that per- 
sons k and r are friends. 


(a) Find P(Aj2). 
(b) Are Aj 2 and Aj, 3 independent. 


(c) Are Aj 2 and A; 3 conditionally independent given the 
sex of person 1. 


(d) Find P(A, 24433). 


Theoretical Exercises 
3.1. Show that if P(A 9 B) > 0, then 


P(A U BIA N B) = P(AN BIA U B) 


3.2. Events A and B are mutually exclusive. Work out the 
following: 
P(A|B), P(A U BIA), 


P(BIAS), P(A U BIA Nn B) 


3.3. A continent-wide television game show involves N 
countries. Country / nominates n; cities including its capital 
city. Two selection processes are being considered to select 
the city that will participate first. Process 1 entails select- 
ing a country first and then picking a city of that country. 
Process 2 entails selecting one city from a pool of all partic- 
ipating cities. Considering that the producers of the show 
would prefer selecting cities that are not capitals, which 
process should they choose? You may use the following 
inequality: 


a 1 
242 


Bi et" 


3.4. A ball is in any one of 7 boxes and is in the ith box 
with probability p;. If the ball is in box i, a search of that 
box will uncover it with probability a;. Show that the con- 
ditional probability that the ball is in box j, given that a 
search of box i did not uncover it, is 


i tie i 
1 — app; 
1 — é A 
( Qj)Pi if j — 
1 — api 


3.5. (a) Prove that if E and F are mutually exclusive, then 


P(E) 
P(E\|E VU FF) = ——_ 
P(E) + P(F) 
(b) Prove that if £;,i = 1 are mutually exclusive, then 


PE) 


Ej 
= $=, PED 


P(Ej| UR, 


3.6. A and B are independent events. Show that 


P(A‘ 1M B°) = P(A°)P(B) 
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Show that A and A®% are not independent when0 < 
P(A) < 1. 


3.7. (a) m females and 1 males attend a meeting in a hall. 
They leave randomly at the end of the meeting. What is the 
probability the first person to leave is of the same gender 
as the last one to leave? 


(b) Three species of rats, Ry, R2, and R3 whose corre- 
sponding abundances are given by 71, r2, and 73, are being 
eradicated from a confined rural space. Assuming that rat 
elimination occurs randomly, what is the probability that 
Rz3 is the last species to be eradicated? 


3.8. Consider three events A, B, and C with P(B) > 0. 


(a) Given that 
P(A|B) = P(C|B) 


show that P(A‘°|B) = P(C‘|B) using direct definitions. 
(b) Show using direct definitions that under independence 
of events A and B, 


P(A‘|B°) = P(A‘) 


is valid. 


(c) Show by counterexample that the independence con- 
dition in (b) is necessary. You may consider the situation 
of two completely disconnected persons who are asked to 
randomly pick one integer from 1 to 10. You may then con- 
struct events A and B for whom the condition you elicit 
does not apply, and also for which the equation above is 
not valid. 


3.9. Consider two independent tosses of a fair coin. Let 
A be the event that the first toss results in heads, let B 
be the event that the second toss results in heads, and let 
C be the event that in both tosses the coin lands on the 
same side. Show that the events A, B, and C are pairwise 
independent —that is, A and B are independent, A and C 
are independent, and B and C are independent—but not 
independent. 


3.10. Two percent of women age 45 who participate in rou- 
tine screening have breast cancer. Ninety percent of those 
with breast cancer have positive mammographies. Eight 
percent of the women who do not have breast cancer will 
also have positive mammographies. Given that a woman 
has a positive mammography, what is the probability she 
has breast cancer? 


3.11. In each of n independent tosses of a coin, the coin 
lands on heads with probability p. How large need n be 
so that the probability of obtaining at least one head is at 
least 5? 


126 Chapter 3 Conditional Probability and Independence 


3.12. Show that 0 = a; = 1,i=1,2,..., then 


CO 


i-1 oo 
> ai] [a —a)| + [a —a)=1 
i=1 j=l i=1 
Hint: Suppose that an infinite number of coins are to be 
flipped. Let a; be the probability that the ith coin lands on 
heads, and consider when the first head occurs. 


3.13. A system moves probabilistically amongst states i = 
1,2,...M. Let Pij be the probability that the system moves 
from state i to state j in n steps. Show that: 


M 
N 1 n—-1 
P ij = > Pix Pj 
k=1 


“3.14. Job assignments in a pool of n workers are affected 
each time a worker is replaced in course of repeated ran- 
dom selections. By conditioning on a particular person 
being chosen twice before anyone else, show that the prob- 
ability that exactly one person chosen twice in k selections 


7 = Die Dh kD 


nk-1 


(k — 1) 


3.15. Independent trials that result in a success with prob- 
ability p are successively performed until a total of r suc- 
cesses is obtained. Show that the probability that exactly n 
trials are required is 


(; _ | ) era — py" 


Use this result to solve the problem of the points (Exam- 
ple 4j). 


Hint: In order for it to take n trials to obtain r successes, 
how many successes must occur in the first — 1 trials? 


3.16. Independent trials that result in a success with prob- 
ability p and a failure with probability 1 — p are called 
Bernoulli trials. Let P, denote the probability that n 
Bernoulli trials result in an even number of successes (0 
being considered an even number). Show that 


Pnh=pd — Pai) + A — p)Pn1i n= 1 
and use this formula to prove (by induction) that 


1+ (1 — 2p)” 
ba S P) 


3.17. Suppose that m independent trials are performed, 
with trial 7 being a success with probability 1/(2i + 1). 


Let P,, denote the probability that the total number of suc- 
cesses that result is an odd number. 


(a) Find P,, for n = 1,2,3,4,5. 

(b) Conjecture a general formula for Py. 

(c) Derive a formula for P, in terms of P,_1. 

(d) Verify that your conjecture in part (b) satisfies the 
recursive formula in part (c). Because the recursive for- 
mula has a unique solution, this then proves that your 
conjecture is correct. 


3.18. Let Q,, denote the probability that no run of 3 con- 
secutive heads appears in v tosses of a fair coin. Show that 


1 1 1 
On = 7 Qn-1 + qQn-2 eb g@n-3 
Q=Q=Q=1 


Find Qs. 


Hint: Condition on the first tail. 


3.19. Consider the gambler’s ruin problem, with the 
exception that A and B agree to play no more than n 
games. Let P,; denote the probability that A winds up 
with all the money when A starts with 7 and B starts with 
N — i. Derive an equation for P,,; in terms of P,_1, ;41 and 
Pri, i-1, and compute P73, N = 5. 


3.20. Consider two urns, each containing both white and 
black balls. The probabilities of drawing white balls from 
the first and second urns are, respectively, p and p’. Balls 
are sequentially selected with replacement as follows: 
With probability a, a ball is initially chosen from the first 
urn, and with probability 1 — a, it is chosen from the sec- 
ond urn. The subsequent selections are then made accord- 
ing to the rule that whenever a white ball is drawn (and 
replaced), the next ball is drawn from the same urn, but 
when a black ball is drawn, the next ball is taken from the 
other urn. Let a, denote the probability that the nth ball 
is chosen from the first urn. Show that 


O41=%(p +p’ -1+1-p'’ n=1 


and use this formula to prove that 
1—p’ 1— p’ 
a Cagag (« - 2—p-p' 


x (—p ua p’ = i 
Let P, denote the probability that the nth ball selected 
is white. Find P,. Also, compute lim,+.@, and 
limy—> oo eps 


3.21. The Ballot Problem. In an election, candidate A 
receives n votes and candidate B receives m votes, where 
n > m. Assuming that all of the (7 + m)!/n! m! orderings 
of the votes are equally likely, let P;,, denote the proba- 
bility that A is always ahead in the counting of the votes. 
(a) Compute P21, P31, P32, P41, P42, P43. 

(b) Find P,4, Pr2. 

(c) On the basis of your results in parts (a) and (b), con- 
jecture the value of Pym. 

(d) Derive a recursion for Pym in terms of Py—1m and 
Pum-1 by conditioning on who receives the last vote. 

(e) Use part (d) to verify your conjecture in part (c) by an 
induction proof onn + m. 


3.22. As a simplified model for weather forecasting, sup- 
pose that the weather (either wet or dry) tomorrow will 
be the same as the weather today with probability p. Show 
that the weather is dry on January 1, then P,,, the proba- 
bility that it will be dry n days later, satisfies 


Ph = (2p — 1)Pn-1 + (1 — p) n21 
Po=1 
Prove that 
1 1 


3.23. A bag contains a white and b black balls. Balls are 
chosen from the bag according to the following method: 


1. A ball is chosen at random and is discarded. 


2. A second ball is then chosen. If its color is different 
from that of the preceding ball, it is replaced in the 
bag and the process is repeated from the beginning. If 
its color is the same, it is discarded and we start from 
step 2. 


In other words, balls are sampled and discarded until a 
change in color occurs, at which point the last ball is 
returned to the urn and the process starts anew. Let Pap 
denote the probability that the last ball in the bag is white. 
Prove that 


P =! 
ab = 5 


Hint: Use induction onk =a + b. 


*3.24. A round-robin tournament of n contestants is a tour- 
nament in which each of the ; pairs of contestants 


play each other exactly once, with the outcome of any 
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play being that one of the contestants wins and the other 
loses. For a fixed integer k, k < n, a question of interest is 
whether it is possible that the tournament outcome is such 
that for every set of k players, there is a player who beat 
each member of that set. Show that if 


G)p-OT <1 


then such an outcome is possible. 


Hint: Suppose that the results of the games are indepen- 
dent and that each game is equally likely to be won by 


either contestant. Number the sets of A contestants, 


and let B; denote the event that no contestant beat all of 
the k players in the ith set. Then use Boole’s inequality to 


bound P { J B; J. 
i 


3.25. Prove directly that 


P(E|F) = P(E|FG)P(G|F) + P(E|FG)P(G*|F) 


3.26. Prove the equivalence of Equations (5.11) and 
(5.12). 


3.27. Extend the definition of conditional independence to 
more than 2 events. 


3.28. Prove or give a counterexample. If E; and E> are 
independent, then they are conditionally independent 
given F. 


3.29. In Laplace’s rule of succession (Example Se), show 
that if the first n flips all result in heads, then the 
conditional probability that the next m flips also result in 
all heads is approximately (n + 1)/(n + m — 1) whenk 
is large. 


3.30. In Laplace’s rule of succession (Example 5e), sup- 
pose that the first 1 flips resulted in r heads and n — r 
tails. Show that the probability that the (n + 1) flip turns 
up heads is (r + 1)/(n + 2). To do so, you will have to 
prove and use the identity 


nim! 
(n+m-+ 1)! 


1 
ij y" — y)"dy= 
0 


Hint: To prove the identity, let C(n,m) = i y"(1 — y)"dy. 
Integrating by parts yields 
m 


C(n,m) = ers + 1,m — 1) 
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Starting with C(n,0) = 1/(n + 1), prove the identity by 
induction on m. 


3.31. Suppose that a nonmathematical, but philosophically 
minded, friend of yours claims that Laplace’s rule of suc- 
cession must be incorrect because it can lead to ridiculous 
conclusions. “For instance,” says he, “the rule states that 


Self-Test Problems and Exercises 


3.1. In a game of bridge, West has no aces. What is the 
probability of his partner’s having (a) no aces? (b) 2 or 
more aces? (c) What would the probabilities be if West 
had exactly 1 ace? 


3.2. The probability that a new car battery functions for 
more than 10,000 miles is .8, the probability that it func- 
tions for more than 20,000 miles is .4, and the probability 
that it functions for more than 30,000 miles is .1. If a new 
car battery is still working after 10,000 miles, what is the 
probability that 


(a) its total life will exceed 20,000 miles? 
(b) its additional life will exceed 20,000 miles? 


3.3. How can 20 balls, 10 white and 10 black, be put into 
two urns so as to maximize the probability of drawing a 
white ball if an urn is selected at random and a ball is 
drawn at random from it? 


3.4. Urn A contains 2 white balls and 1 black ball, whereas 
urn B contains 1 white ball and 5 black balls. A ball is 
drawn at random from urn A and placed in urn B. A ball 
is then drawn from urn B. It happens to be white. What is 
the probability that the ball transferred was white? 


3.5. An urn has r red and w white balls that are randomly 
removed one at a time. Let R; be the event that the ith ball 
removed is red. Find 


(a) P(R;) 
(b) P(R5|R3) 
(c) P(R3|Rs) 


3.6. An urn contains b black balls and r red balls. One of 
the balls is drawn at random, but when it is put back in 
the urn, c additional balls of the same color are put in with 
it. Now, suppose that we draw another ball. Show that the 
probability that the first ball was black, given that the sec- 
ond ball drawn was red, is b/(b + r + c). 


3.7. A friend randomly chooses two cards, without replace- 
ment, from an ordinary deck of 52 playing cards. In each of 
the following situations, determine the conditional proba- 
bility that both cards are aces. 


if a boy is 10 years old, having lived 10 years, the boy 
has probability i of living another year. On the other 
hand, if the boy has an 80-year-old grandfather, then, by 
Laplace’s rule, the grandfather has probability e of sur- 
viving another year. However, this is ridiculous. Clearly, 
the boy is more likely to survive an additional year than 
the grandfather is.” How would you answer your friend? 


(a) You ask your friend if one of the cards is the ace of 
spades, and your friend answers in the affirmative. 


(b) You ask your friend if the first card selected is an ace, 
and your friend answers in the affirmative. 


(c) You ask your friend if the second card selected is an 
ace, and your friend answers in the affirmative. 


(d) You ask your friend if either of the cards selected is an 
ace, and your friend answers in the affirmative. 
3.8. Show that 

P(A\E) P(A) P(E|A) 

P(G\|E) P(G) P(E|G) 


Suppose that, before new evidence is observed, the 
hypothesis H is three times as likely to be true as is the 
hypothesis G. If the new evidence is twice as likely when 
Gis true than it is when H is true, which hypothesis is more 
likely after the evidence has been observed? 


3.9. You ask your neighbor to water a sickly plant while 
you are on vacation. Without water, it will die with prob- 
ability .8; with water, it will die with probability .15. You 
are 90 percent certain that your neighbor will remember 
to water the plant. 


(a) What is the probability that the plant will be alive when 
you return? 

(b) If the plant is dead upon your return, what is the prob- 
ability that your neighbor forgot to water it? 


3.10. Six balls are to be randomly chosen from an urn con- 
taining 8 red, 10 green, and 12 blue balls. 


(a) What is the probability at least one red ball is chosen? 
(b) Given that no red balls are chosen, what is the con- 
ditional probability that there are exactly 2 green balls 
among the 6 chosen? 


3.11. A type C battery is in working condition with proba- 
bility .7, whereas a type D battery is in working condition 
with probability .4. A battery is randomly chosen from a 
bin consisting of 8 type C and 6 type D batteries. 


(a) What is the probability that the battery works? 


(b) Given that the battery does not work, what is the con- 
ditional probability that it was a type C battery? 


3.12. Maria will take two books with her on a trip. Sup- 
pose that the probability that she will like book 1 is .6, the 
probability that she will like book 2 is .5, and the probabil- 
ity that she will like both books is .4. Find the conditional 
probability that she will like book 2 given that she did not 
like book 1. 


3.13. A detective has evidence that 2 persons were 
involved in the crime he is investigating. Three suspects 
with very similar criminal records were seen close to the 
crime scene. Two of them are believed to be guilty. What 
is the probability that the first suspect being interviewed is 
guilty? 

Considering that the detective has managed to confirm 
one interviewee as having been involved in committing the 
crime, what is the probability that the next person to be 
interviewed is innocent? 


3.14. A coin having probability .8 of landing on heads is 
flipped. A observes the result—either heads or tails—and 
rushes off to tell B. However, with probability .4, A will 
have forgotten the result by the time he reaches B. If A 
has forgotten, then, rather than admitting this to B, he is 
equally likely to tell B that the coin landed on heads or 
that it landed tails. (If he does remember, then he tells B 
the correct result.) 


(a) What is the probability that B is told that the coin 
landed on heads? 

(b) What is the probability that B is told the correct result? 
(c) Given that B is told that the coin landed on heads, what 
is the probability that it did in fact land on heads? 


3.15. In a certain species of rats, black dominates over 
brown. Suppose that a black rat with two black parents 
has a brown sibling. 


(a) What is the probability that this rat is a pure black 
rat (as opposed to being a hybrid with one black and one 
brown gene)? 

(b) Suppose that when the black rat is mated with a brown 
rat, all 5 of their offspring are black. Now what is the prob- 
ability that the rat is a pure black rat? 


3.16. (a) In Problem 3.70b, find the probability that a cur- 
rent flows from A to B, by conditioning on whether relay 
1 closes. 

(b) Find the conditional probability that relay 3 is closed 
given that a current flows from A to B. 


3.17. For the k-out-of-1 system described in Problem 3.71, 
assume that each component independently works with 
probability 5. Find the conditional probability that com- 
ponent 1 is working, given that the system works, 
when 
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(a) k =1,n=2; 
(b) kK =2,n=3. 


3.18. Mr. Jones has devised a gambling system for win- 
ning at roulette. When he bets, he bets on red and places 
a bet only when the 10 previous spins of the roulette have 
landed on a black number. He reasons that his chance of 
winning is quite large because the probability of 11 con- 
secutive spins resulting in black is quite small. What do 
you think of this system? 


3.19. Three players simultaneously toss coins. The coin 
tossed by A(B)[C] turns up heads with probability 
P(P2)[P3]. If one person gets an outcome different from 
those of the other two, then he is the odd man out. If there 
is no odd man out, the players flip again and continue to do 
so until they get an odd man out. What is the probability 
that A will be the odd man? 


3.20. Suppose that there are n possible outcomes of a 
trial, with outcome i resulting with probability p;,i = 


nl 
1,...,n, )° p; = 1. If two independent trials are observed, 


i=1 
what is the probability that the result of the second trial is 
larger than that of the first? 


3.21. If A flips 7 + 1 and B flips n fair coins, show that the 
probability that A gets more heads than B is 7. 


Hint: Condition on which player has more heads after each 
has flipped n coins. (There are three possibilities.) 


3.22. Prove or give counterexamples to the following 
statements: 


(a) If E is independent of F and E is independent of G, 
then E is independent of F U G. 


(b) If E is independent of F, and EF is independent of G, 
and FG = @, then E is independent of F U G. 


(c) If E is independent of F, and F is independent of G, 
and E is independent of FG, then G is independent of EF. 


3.23. Let A and B be events having positive probability. 
State whether each of the following statements is (i) nec- 
essarily true, (ii) necessarily false, or (iii) possibly true. 


(a) If A and B are mutually exclusive, then they are inde- 
pendent. 


(b) If A and B are independent, then they are mutually 
exclusive. 


(c) P(A) = P(B) = .6, and A and B are mutually exclusive. 
(d) P(A) = P(B) = .6, and A and B are independent. 
3.24. Rank the following from most likely to least likely to 
occur: 

1. A fair coin lands on heads. 


2. Three independent trials, each of which is a success with 
probability .8, all result in successes. 
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3. Seven independent trials, each of which is a success with 
probability .9, all result in successes. 


3.25. Two local factories, A and B, produce radios. Each 
radio produced at factory A is defective with probability 
.05, whereas each one produced at factory B is defective 
with probability .01. Suppose you purchase two radios that 
were produced at the same factory, which is equally likely 
to have been either factory A or factory B. If the first radio 
that you check is defective, what is the conditional proba- 
bility that the other one is also defective? 


3.26. Show that if P(A|B) = 1, then P(B°|AS) = 1. 


3.27. An urn initially contains 1 red and 1 blue ball. At 
each stage, a ball is randomly withdrawn and replaced by 
two other balls of the same color. (For instance, if the red 
ball is initially chosen, then there would be 2 red and 1 
blue balls in the urn when the next selection occurs.) Show 
by mathematical induction that the probability that there 
are exactly i red balls in the urn after n stages have been 


completed is yl Six=n+ 1. 


3.28. A total of 21 cards, of which 2 are aces, are to be ran- 
domly divided among two players, with each player receiv- 
ing n cards. Each player is then to declare, in sequence, 
whether he or she has received any aces. What is the condi- 
tional probability that the second player has no aces, given 
that the first player declares in the affirmative, when (a) 
n = 2? (b) n = 10? (c) n = 100? To what does the proba- 
bility converge as n goes to infinity? Why? 


3.29. There are n distinct types of coupons, and each 
coupon obtained is, independently of prior types collected, 
of type i with probability p;, )-"_, pi = 1. 


(a) If n coupons are collected, what is the probability that 
one of each type is obtained? 

(b) Now suppose that p} = po = --- = pn = 1/n. Let E; 
be the event that there are no type i coupons among the 
n collected. Apply the inclusion—exclusion identity for the 
probability of the union of events to P(U;£;) to prove the 


identity 
n n : 
n=) cot(pje = #) 


k=0 


3.30. Show that for any events F and F, 


P(E\E U F) = P(E\F) 


Hint: Compute P(E|E U F) by conditioning on whether F 
occurs. 


3.31. (a) If the odds of A is 2/3, what is the probability that 
A occurs. 

(b) If the odds of A is 5, what is the probability that A 
occurs. 


3.32. A fair coin is flipped 3 times. Let E be the event that 
all flips land heads. 


(a) What is the odds of the event E. 


(b) What is the conditional odds of the event EF given that 
at least one of the coins landed heads. 


3.33. If the events E, F, G are independent, show that 
P(E|FG*‘) = P(E). 


3.34. Players 1, 2,3 are in a contest. Two of them are ran- 
domly chosen to play a game in round one, with the winner 
then playing the remaining player in round two. The win- 
ner of the round two game is the winner of the contest. 
Assuming that all games are independent and that i wins 
when playing against j with probability iy find the proba- 
bility that 1 is the winner of the contest. Given that 1 is the 
winner, what is the conditional probability that 1 did not 
play in the first round. 


3.35. If 4 balls are randomly chosen from an urn contain- 
ing 4 red, 5 white, 6 blue, and 7 green balls, find the condi- 
tional probability they are all white given that all balls are 
of the same color. 


3.36. In a 4 player tournament, player 1 plays player 2, 
player 3 plays player 4, with the winners then playing for 
the championship. Suppose that a game between player i 


and player j is won by player 7 with probability ay Find 
the probability that player 1 wins the championship. 


3.37. In a tournament involving players 1,...,n, players 1 
and 2 play a game, with the loser departing and the winner 
then playing against player 3, with the loser of that game 
departing and the winner then playing player 4, and so on. 
The winner of the game against player n is the tournament 
winner. Suppose that a game between players i andj is won 
by player i with probability ae 

(a) Find the probability that player 3 is the tournament 
winner. 

(b) If n = 4, find the probability that player 4 is the tour- 
nament winner. 
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4.1 Random Variables 


Example 
la 


When an experiment is performed, we are frequently interested mainly in some func- 
tion of the outcome as opposed to the actual outcome itself. For instance, in tossing 
dice, we are often interested in the sum of the two dice and are not really concerned 
about the separate values of each die. That is, we may be interested in knowing 
that the sum is 7 and may not be concerned over whether the actual outcome was 
(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), or (6, 1). Also, in flipping a coin, we may be inter- 
ested in the total number of heads that occur and not care at all about the actual 
head-tail sequence that results. These quantities of interest, or, more formally, these 
real-valued functions defined on the sample space, are known as random variables. 

Because the value of a random variable is determined by the outcome of the 
experiment, we may assign probabilities to the possible values of the random 
variable. 


Suppose that our experiment consists of tossing 3 fair coins. If we let Y denote the 
number of heads that appear, then Y is a random variable taking on one of the values 
0, 1,2, and 3 with respective probabilities 
1 
3 
PLY = 1} = Pt(t, th), (t,h, t), (h, t, t)} = 8 
3 
PLY = 2} = Pt(t, h,h), (h, th), (h, h, t)} = 8 


PLY = 3} = P{(h,h,h)} = : 
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Example 
Ib 


Example 
Ic 


Example 
Id 


Since Y must take on one of the values 0 through 3, we must have 


3 3 
ee Lv =a = Pa 
i=0 i=0 
which, of course, is in accord with the preceding probabilities. O 


A life insurance agent has 2 elderly clients, each of whom has a life insurance policy 
that pays $100,000 upon death. Let Y be the event that the younger one dies in the 
following year, and let O be the event that the older one dies in the following year. 
Assume that Y and O are independent, with respective probabilities P(Y) = .05 and 
P(O) = .10. If X denotes the total amount of money (in units of $100,000) that 
will be paid out this year to any of these clients’ beneficiaries, then X is a random 
variable that takes on one of the possible values 0, 1,2 with respective probabilities 


P{X =0} = P(Y°O*) = P(Y)P(O*) = (.95)(.9) = .855 
P(X = 1} = P(YO’) + P(¥“O) = (.05)(.9) + (.95)(.1) = 140 
P{X = 2} = P(YO) = (.05)(.1) = .005 a 


Four balls are to be randomly selected, without replacement, from an urn that con- 
tains 20 balls numbered 1 through 20. If X is the largest numbered ball selected, then 
X is a random variable that takes on one of the values 4,5,...,20. Because each of 
the (7?) possible selections of 4 of the 20 balls is equally likely, the probability that XY 
takes on each of its possible values is 


(5°) 
(4) 

This is so because the number of selections that result in XY = i is the number of 
selections that result in ball numbered i and three of the balls numbered 1 through 
i — 1 being selected. As there are () fs) such selections, the preceding equation 
follows. 

Suppose now that we want to determine P{X > 10}. One way, of course, is to 
just use the preceding to obtain 


20 20 (5) 
PIX > 109 =) PIX ==> 


20 
Zit i=11 (4) 


However, a more direct approach for determining P(X > 10) would be to use 
(4) 
P{X > 10}=1 — P(X = 10}=1- 4 
(4) 
where the preceding results because X will be less than or equal to 10 when the 4 
balls chosen are among balls numbered 1 through 10. a 


Independent trials consisting of the flipping of a coin having probability p of coming 
up heads are continually performed until either a head occurs or a total of n flips is 
made. If we let X denote the number of times the coin is flipped, then _X is a random 
variable taking on one of the values 1,2,3,...,7 with respective probabilities 
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P{X = 1} = Ph} =p 
P{X = 2} = Plit,)} = (1 — p)p 


P{X = 3} = Pl(t,t,)} = (1 — p)’p 


P{X =n — 1} =Pl(t,t,...,t,4)} =( — p)"7p 
~~ -——S 
n—-2 


PXSh SP Ghast OGheh Hd =p 
ns nd 


n-1 n—1 


As a check, note that 


P Lx =3 = Pay 
i=1 i=1 


n—-1 


=)opd - pit +d - pt 
i=] 


1-1 — py! ii 
—} —.T———— 1- 
[P| + P) 


=1-(-p)*'=4+ ad -p)* 
—1 (es 


Example Suppose that there are r distinct types of coupons and that each time one obtains a 
le coupon, it is, independently of previous selections, equally likely to be any one of the 
r types. One random variable of interest is 7, the number of coupons that need to be 
collected until one obtains a complete set of at least one of each type. Rather than 
derive P{T = n} directly, let us start by considering the probability that T is greater 
than n. To do so, fix n and define the events Aj, Az,...,A; as follows: A; is the event 
that no type j coupon is contained among the first n coupons collected, j = 1,...,7. 
Hence, by the inclusion-exclusion identity 


=) P(A) — D>) PtAj, Ap) + 

j ii<b 

+ ODM DD PA Ab Aid 
hi<pa<-<ik 


+ (-1)"!P(A1A2--- Ay) 


134 Chapter 4 Random Variables 


Now, Aj; will occur if each of the n coupons collected is not of type j. Since each of 
the coupons will not be of type j with probability (r — 1)/r, we have, by the assumed 
independence of the types of successive coupons, 


=i n 
nay ("5) 


Also, the event Aj, Aj;, will occur if none of the first n coupons collected is of either 
type j; or type j2. Thus, again using independence, we see that 


— 2\n 
P(Aj, Aj) = (: - ) 


The same reasoning gives 


r—k\" 
P(Aj, Aj, -Aj,) = ( ) 


r 
and we see that forn > 0, 
r—1\" r\(r—-2\" r\(r—-3\" 
PT > n)=r(“—) -(5)() + (4)( : ) she 
1 n 
re, 2)@) 
r-1 r r i\? 
as - i+1 
S()\(SYo a) 


The probability that T equals m can now be obtained from the preceding formula by 
the use of 


P{T >n-1=P{T=n} + P{T > nv} 


or, equivalently, 
P{T=n}=P{T >n—- 1} — P{iT > n} 


Another random variable of interest is the number of distinct types of coupons 
that are contained in the first n selections—call this random variable D,. To compute 
P{D, = k}, let us start by fixing attention on a particular set of k distinct types, 
and let us then determine the probability that this set constitutes the set of distinct 
types obtained in the first 7 selections. Now, in order for this to be the situation, it is 
necessary and sufficient that of the first 1 coupons obtained, 


A: each is one of these k types 
B: each of these k types is represented 


Now, each coupon selected will be one of the k types with probability k/r, so the 
probability that A will be valid is (k/r)”. Also, given that a coupon is of one of the 
k types under consideration, it is easy to see that it is equally likely to be of any one 
of these k types. Hence, the conditional probability of B given that A occurs is the 
same as the probability that a set of m coupons, each equally likely to be any of k 
possible types, contains a complete set of all k types. But this is just the probability 
that the number needed to amass a complete set, when choosing among k types, is 
less than or equal to n and is thus obtainable from Equation (1.1) with & replacing r. 
Thus, we have 
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k nh 
ei (") 


k-1 AN 
P(BIA)=1- > @ (- . -) (-p"*! 


i=1 


r 


Finally, as there are ( ‘ 


possible choices for the set of k types, we arrive at 


PID, =i = ( ‘) P(AB) 


(JO) [-8() yew 


i=1 


Remark We can obtain a useful bound on P(T > n) = P(UE_,Aj) by using Boole’s 
inequality along with the inequality e~* = 1 — x. 


P(T > n= P(Ui_,Aj) 


= )° P(Aj) 
j=l 


=r — In 
r 


re" 


The first inequality is Boole’s inequality, which says that the probability of the union 
of events is always less than or equal to the sum of the probabilities of these events, 
and the last inequality uses that e~!/" = 1 — 1/r. a 


For a random variable X, the function F defined by 
F(x) = P{X S x} —-w<x< oO 


is called the cumulative distribution function or, more simply, the distribution func- 
tion of X. Thus, the distribution function specifies, for all real values x, the probability 
that the random variable is less than or equal to x. 

Now, suppose that a = b. Then, because the event {X = a} is contained in the 
event {X =< 5}, it follows that F(a), the probability of the former, is less than or equal 
to F(b), the probability of the latter. In other words, F(x) is a nondecreasing function 
of x. Other general properties of the distribution function are given in Section 4.10. 


4.2 Discrete Random Variables 


A random variable that can take on at most a countable number of possible values is 
said to be discrete. For a discrete random variable X, we define the probability mass 
function p(a) of X by 


p(a) = P(X =a} 
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The probability mass function p(a) is positive for at most a countable number of 
values of a. That is, if X must assume one of the values x;,x2,..., then 


p(xi) =O fori=1,2,... 
p(x) =0 — forall other values of x 


Since X must take on one of the values x;, we have 


Yip@p) =1 
i=] 


It is often instructive to present the probability mass function in a graphical 
format by plotting p(x;) on the y-axis against x; on the x-axis. For instance, if the 
probability mass function of X is 


1 1 1 
= = 1 = = 2 => = 
Dey £0=5. 2457 
we can represent this function graphically as shown in Figure 4.1. Similarly, a graph 


of the probability mass function of the random variable representing the sum when 
two dice are rolled looks like Figure 4.2. 


P(x) 


IL 
iW Hl 
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Figure 4.1 
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Figure 4.2 


Example 
2a 
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The probability mass function of a random variable X is given by p(i) = ca‘/i!, 
i= 0,1,2,..., where A is some positive value. Find (a) PLX = 0} and (b) P{X > 2}. 


[o@) 
Solution Since > p(i) = 1, we have 
i=0 


oo . 
which, because e* = > x'/i!, implies that 
i=0 
d 


ce =1 or c=e 


Hence, 
(a) P{X = 0} = e~*a9/0! = e+ 
(b) P{X > 2}=1 — P{X <2}=1 — P{X¥ =0} — P{x=]} 


— P{X =2} 
Pe 
= -h =} 
=l-e re 5 = 
The cumulative distribution function F can be expressed in terms of p(a) by 
Fa@)= >> pe) 
allx =a 


If X is a discrete random variable whose possible values are x1, x2,x3,..., where 
XxX, < x2 < x3 < ---, then the distribution function F of X is a step function. That 
is, the value of F is constant in the intervals (x;_1, x;) and then takes a step (or jump) 
of size p(x;) at x;. For instance, if X has a probability mass function given by 


1 


1 1 1 
p(Q)= ri p(2) = > pGB)= 8 pA= 8 


then its cumulative distribution function is 


Oa<il 
,is<a<2 
F@=}32<a <3 
£3<a<4 
14<a 


This function is depicted graphically in Figure 4.3. 
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Figure 4.3 
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Note that the size of the step at any of the values 1, 2, 3, and 4 is equal to the 
probability that X assumes that particular value. 


4.3 Expected Value 


Example 
3a 


One of the most important concepts in probability theory is that of the expectation 
of a random variable. If X is a discrete random variable having a probability mass 
function p(x), then the expectation, or the expected value, of X, denoted by E[X], is 
defined by 


E[X]= DS) xp@) 


xip(x)>0 


In words, the expected value of X is a weighted average of the possible values that 
X can take on, each value being weighted by the probability that X assumes it. For 
instance, on the one hand, if the probability mass function of X is given by 


1 
p0)= > = p(1) 


Ix] =0(;) + 1(5)=5 


is just the ordinary average of the two possible values, 0 and 1, that X can assume. 
On the other hand, if 


then 


0-2 j= 
PO)=5 pI=3 


Fx] =0(;) + 1(5)=3 


is a weighted average of the two possible values 0 and 1, where the value 1 is given 
twice as much weight as the value 0, since p(1) = 2p(0). 

Another motivation of the definition of expectation is provided by the frequency 
interpretation of probabilities. This interpretation (partially justified by the strong 
law of large numbers, to be presented in Chapter 8) assumes that if an infinite 
sequence of independent replications of an experiment is performed, then, for any 
event E, the proportion of time that E occurs will be P(E). Now, consider a random 
variable X that must take on one of the values x1,x2,...%, with respective probabil- 
ities p(x1), p(X2),...,P(Xn), and think of X as representing our winnings in a single 
game of chance. That is, with probability p(x;), we shall win x; units i = 1,2,...,n. By 
the frequency interpretation, if we play this game continually, then the proportion of 
time that we win x; will be p(x;). Since this is true for all i, = 1,2,...,n, it follows 
that our average winnings per game will be 


then 


Yo xip(xi) = E[X] 
i=1 


Find E[X], where X is the outcome when we roll a fair die. 


Example 
3b 


Example 
3c 
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Solution Since p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = é we obtain 
1 1 1 1 1 1 7 
E[X]=1( = 2(- 3(- 4[— 5([- 6(-)=- 
x=1() +2(3) +3(g) +4(8) +5(¢) +98) -3 
We say that J is an indicator variable for the event A if 


om 1 if A occurs 
~ ) 0 if A‘ occurs 


Find E[/]. 
Solution Since p(1) = P(A), p(0) =1 — P(A), we have 
E{I] = P(A) 


That is, the expected value of the indicator variable for the event A is equal to the 
probability that A occurs. i 


A contestant on a quiz show is presented with two questions, questions 1 and 2, which 
he is to attempt to answer in some order he chooses. If he decides to try question i 
first, then he will be allowed to go on to question j, j # i, only if his answer to question 
i is correct. If his initial answer is incorrect, he is not allowed to answer the other 
question. The contestant is to receive V; dollars if he answers question i correctly, 
i = 1,2. For instance, he will receive V; + V2 dollars if he answers both questions 
correctly. If the probability that he knows the answer to question i is P;,i = 1,2, 
which question should he attempt to answer first so as to maximize his expected 
winnings? Assume that the events £;,i = 1,2, that he knows the answer to question 
i are independent events. 


Solution On the one hand, if he attempts to answer question 1 first, then he will win 


0 with probability 1 — P; 
V1 with probability Pj(1 — P2) 
V, + V2 with probability P,P 


Hence, his expected winnings in this case will be 
Vari = £2) a Va Pir 


On the other hand, if he attempts to answer question 2 first, his expected winnings 
will be 
WaPo = yy a a VPP 


Therefore, it is better to try question 1 first if 
ViPi( — P2) = V2P20. — Pi) 


or, equivalently, if 
Vari 3... Vato 


i=), =e 


For example, if he is 60 percent certain of answering question 1, worth $200, correctly 
and he is 80 percent certain of answering question 2, worth $100, correctly, then he 
should attempt to answer question 2 first because 
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_ 00)(.8) _ (200)(.6) _ 


400 300 Oo 


Example A school class of 120 students is driven in 3 buses to a symphonic performance. There 
3d are 36 students in one of the buses, 40 in another, and 44 in the third bus. When the 
buses arrive, one of the 120 students is randomly chosen. Let X denote the number 

of students on the bus of that randomly chosen student, and find EX]. 


Solution Since the randomly chosen student is equally likely to be any of the 120 
students, it follows that 


36 40 44 
PX =36}= 5 PX == 5, Pixk=4=7, 
Hence, 
3 1 11 1208 
E[X] = 36 (3) + 40 (5) + 44 (=) a ae 40.2667 


However, the average number of students on a bus is 120/3 = 40, showing that 
the expected number of students on the bus of a randomly chosen student is larger 
than the average number of students on a bus. This is a general phenomenon, and 
it occurs because the more students there are on a bus, the more likely it is that 
a randomly chosen student would have been on that bus. As a result, buses with 
many students are given more weight than those with fewer students. (See Self-Test 
Problem 4.4) O 


Remark The probability concept of expectation is analogous to the physical con- 
cept of the center of gravity of a distribution of mass. Consider a discrete random 
variable X having probability mass function p(x;), i = 1. If we now imagine a weight- 
less rod in which weights with mass p(x;),i = 1, are located at the points x;,i = 1 
(see Figure 4.4), then the point at which the rod would be in balance is known as the 
center of gravity. For those readers acquainted with elementary statics, it is now a 
simple matter to show that this point is at ELX]." | 


g ® 2) 2) 
0 


-1 al 2 
p(-1) = 10, p(0) = .25, p(1) = .30, p(2) = .35 


a = center of gravity = .9 


Figure 4.4 


4.4 Expectation of a Function of a Random Variable 


Suppose that we are given a discrete random variable along with its probability mass 
function and that we want to compute the expected value of some function of X, say, 
g(X). How can we accomplish this? One way is as follows: Since g(X) is itself a dis- 
crete random variable, it has a probability mass function, which can be determined 
from the probability mass function of X. Once we have determined the probability 


To prove this, we must show that the sum of the torques tending to turn the point around E[X] is equal to 0. 
That is, we must show that 0 = )°(x; — E[X])p(xj), which is immediate. 
i 
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mass function of g(X), we can compute E[g(X)] by using the definition of expected 
value. 


Example Let X denote a random variable that takes on any of the values —1, 0, and 1 with 
4a respective probabilities 
P{X=-1ly=2 P(X=O=5 P{X=1}= 3 
Compute ELX?]. 
Solution Let Y = X’. Then the probability mass function of Y is given by 
PY =V}=P{X =-1} + Pix =1}=55 
PLY = 0} = P(X =0} = 55 
Hence, 
E[X?] = E[Y] =1(.5) + 00.5) =.5 
Note that 
5 = E[X’] # (E[X])? =.01 a 
Although the preceding procedure will always enable us to compute the expec- 
ted value of any function of X from a knowledge of the probability mass function 
of X, there is another way of thinking about E[g(X)]: Since g(X) will equal g(x) 
whenever X is equal to x, it seems reasonable that E[g(X)] should just be a weighted 
average of the values g(x), with g(x) being weighted by the probability that X is equal 
to x. That is, the following result is quite intuitive. 
Proposition If X is a discrete random variable that takes on one of the values x;,i = 1, with 
4.1 respective probabilities p(x;), then, for any real-valued function g, 


Elg(X)] = 0 (pai) 


Before proving this proposition, let us check that it is in accord with the results of 
Example 4a. Applying it to that example yields 
E(X?} = (-1)°(.2) + 0°(5) + 17°(3) 
=1(2 + .3) + 0(5) 
=.5 


which is in agreement with the result given in Example 4a. 


Proof of Proposition 4.1 The proof of Proposition 4.1 proceeds, as in the preceding 
verification, by grouping together all the terms in }* g(x;)p(x;) having the same value 


Ll 
of g(x;). Specifically, suppose that y;,j = 1, represent the different values of g(x;),i = 
1. Then, grouping all the g(x;) having the same value gives 
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Yiscdpei = >> Yo g@dp@ 


Jj tg@i=y; 


=> dO ype 
J lg@i=y; 

=> dO p@ 
J ing (xi)=yj 

=o yP{g(X) = yj} 
J 

= E[g(x)] 


Example A product that is sold seasonally yields a net profit of b dollars for each unit sold and 
4b a net loss of @ dollars for each unit left unsold when the season ends. The number 
of units of the product that are ordered at a specific department store during any 
season is a random variable having probability mass function p(i),i = 0. If the store 
must stock this product in advance, determine the number of units the store should 

stock so as to maximize its expected profit. 


Solution Let X denote the number of units ordered. If s units are stocked, then the 
profit —call it P(s)—can be expressed as 


P(is)=bX —(s— xX) ifX Ss 
=sb ifX >s5 


Hence, the expected profit equals 


E[P(s)] = )“[bi — (s — dep + SY sbpi) 
i=0 i=s+1 


=(b + £)) ip@ — st) p@ + sb|1 - Do p@ 


i=0 i=0 i=0 


=(b + ) > ip® — (6 + Os > “pW + sb 


i=0 i=0 
Ss 
=sb + (b+ O° — s)pli) 
i=0 
To determine the optimum value of s, let us investigate what happens to the profit 
when we increase s by 1 unit. By substitution, we see that the expected profit in this 
case is given by 
s+1 
E[P(s + 1] =b6 + 1) + 6+ 2G -— 5 - Ip 
i=0 


=bis+1)+ (+09) G-s5 -— lp 
i=0 
Therefore, 


E[P(s + 1)] — ELP®)]=b - 6+ > p@ 
i=0 


Example 
4c 
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Thus, stocking s + 1 units will be better than stocking s units whenever 


yp) aa : (4.1) 
i=0 a t 


Because the left-hand side of Equation (4.1) is increasing in s while the right-hand 
side is constant, the inequality will be satisfied for all values of s = s*, where s* is the 
largest value of s satisfying Equation (4.1). Since 


E[P(O)] < --- < E[P(s*)] < E[P* + 1)] > ELP(s* + 2)] > --- 


it follows that stocking s* + 1 items will lead to a maximum expected profit. a 


Utility 


Suppose that you must choose one of two possible actions, each of which can result 
in any of nm consequences, denoted as Cj,...,C,. Suppose that if the first action is 
chosen, then consequence C; will result with probability p;,i = 1,...,n, whereas 
if the second action is chosen, then consequence C; will result with probability qj, 


n n 

i=1,...,n, where > pj = >< qi = 1. The following approach can be used to deter- 
mine which action oe siege: Start by assigning numerical values to the different 
consequences. First, identify the least and the most desirable consequences—call 
them c and C, respectively; give consequence c the value 0 and give C the value 1. 
Now consider any of the other n — 2 consequences, say, C;. To value this conse- 
quence, imagine that you are given the choice between either receiving C; or taking 
part in a random experiment that either earns you consequence C with probabil- 
ity u or consequence c with probability 1 — u. Clearly, your choice will depend on 
the value of u. On the one hand, if u = 1, then the experiment is certain to result 
in consequence C, and since C is the most desirable consequence, you will prefer 
participating in the experiment to receiving C;. On the other hand, if u = 0, then 
the experiment will result in the least desirable consequence— namely, c—so in this 
case you will prefer the consequence C; to participating in the experiment. Now, 
as u decreases from 1 to 0, it seems reasonable that your choice will at some point 
switch from participating in the experiment to the certain return of C;, and at that 
critical switch point you will be indifferent between the two alternatives. Take that 
indifference probability u as the value of the consequence Cj. In other words, the 
value of C; is that probability u such that you are indifferent between either receiv- 
ing the consequence C; or taking part in an experiment that returns consequence C 
with probability u or consequence c with probability 1 — u. We call this indifference 
probability the utility of the consequence C;, and we designate it as u(C;). 

To determine which action is superior, we need to evaluate each one. Consider 
the first action, which results in consequence C; with probability p;,i = 1,...,n. We 
can think of the result of this action as being determined by a two-stage experiment. 
In the first stage, one of the values 1,...,n is chosen according to the probabilities 
P1,---»Pn; if value i is chosen, you receive consequence C;. However, since C; is 
equivalent to obtaining consequence C with probability u(C;) or consequence c with 
probability 1 — u(C;), it follows that the result of the two-stage experiment is equiv- 
alent to an experiment in which either consequence C or consequence c is obtained, 
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Corollary 
4.1 


4.5 Variance 


with C being obtained with probability 


n 
Y= pit(Ci) 
i=1 


Similarly, the result of choosing the second action is equivalent to taking part in an 
experiment in which either consequence C or consequence c is obtained, with C 
being obtained with probability 


n 
Yo qiu(Ci) 
i=), 


Since C is preferable to c, it follows that the first action is preferable to the 
second action if 


n n 
Yo pic) > Yo qiu(Ci) 
i=1 i=1 


In other words, the worth of an action can be measured by the expected value of the 
utility of its consequence, and the action with the largest expected utility is the most 
preferable. O 


A simple logical consequence of Proposition 4.1 is Corollary 4.1. 


If a and b are constants, then 
ElaX + b] =aE[X] + b 
Proof 


ElaX + bl = > (ax + b)p(x) 


xip(x)>0 

=a )> xpx)+b D> pr) 
xip(x)>0 xip(x)>0 

= aE|X] + b 


The expected value of a random variable X, E[X’], is also referred to as the mean 
or the first moment of X. The quantity E[X”],n = 1, is called the nth moment of X. 
By Proposition 4.1, we note that 


E[X"] = x, x"p(x) 


xip(x)>0 


Given a random variable X along with its distribution function F, it would be 
extremely useful if we were able to summarize the essential properties of F by cer- 
tain suitably defined measures. One such measure would be EX], the expected value 
of X. However, although ELX] yields the weighted average of the possible values of 
X, it does not tell us anything about the variation, or spread, of these values. For 
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instance, although random variables W, Y, and Z having probability mass functions 
determined by 


W=0_ with probability 1 
—1 with probability 5 


Y= 
+1 with probability : 


; —100 with probability 5 
~ | +100 with probability 4 


all have the same expectation—namely, 0—there is a much greater spread in the 
possible values of Y than in those of W (which is a constant) and in the possible 
values of Z than in those of Y. 

Because we expect X to take on values around its mean ELX], it would appear 
that a reasonable way of measuring the possible variation of X would be to look 
at how far apart X would be from its mean, on the average. One possible way to 
measure this variation would be to consider the quantity E[|X — w|], where uw = 
E|[X]. However, it turns out to be mathematically inconvenient to deal with this 
quantity, so a more tractable quantity is usually considered—namely, the expectation 
of the square of the difference between X and its mean. We thus have the following 
definition. 


Definition 
If X is a random variable with mean yp, then the variance of X, denoted by 
Var(X), is defined by 

Var(X) = E[(X — y)?] 


An alternative formula for VarCX) is derived as follows: 
Var(X) = E[(X — y)’] 
= )o@ — p)’p) 
x 


=) o@* — 2px + u’)p@) 


= S> x’p(x) — 2 xp) + ue Yi p@ 


= E[X?] — 2y? + w? 


That is, 


Var(X) = E[X*] — (E[X])? 


In words, the variance of X is equal to the expected value of X* minus the square 
of its expected value. In practice, this formula frequently offers the easiest way to 
compute Var(X). 


Example Calculate Var(X) if X represents the outcome when a fair die is rolled. 
5a 
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Example 
5b 


Solution It was shown in Example 3a that ELX] = 5. Also, 


avr (2) +2) (2) +#@) +80) #0) 
-( 


Hence, 


1 a 235 

Var(X) = 2 (5) — oO 
Because Var(X) = E[(X — p)*] = ya - u)*P(X = x) is the sum of nonnegative 
terms, it follows that Var(X) = 0 or equivalently, that 


E[X?] = (E[X])? 


That is, the expected value of the square of a random variable is at least as large as 
the square of its expected value. 


The friendship paradox is often expressed as saying that on average your friends 
have more friends than you do. More formally, suppose that there are n people in a 
certain population, labeled 1,2,...,, and that certain pairs of these individuals are 
friends. This friendship network can be graphically represented by having a circle for 
each person and then having a line between circles to indicate that those people are 
friends. For instance, Figure 4.5 indicates that there are 4 people in the community 
and that persons 1 and 2 are friends, persons 1 and 3 are friends, persons 1 and 4 are 
friends, and persons 2 and 4 are friends. 

Let f(@ denote the number of friends of person i and let f = )“_, f(@. (Thus, 
for the network of Figure 4.5, f(1) = 3, f(2) = 2, fG) = 1, f(4) = 2 and f = 8.) Now, 
let X be a randomly chosen individual, equally likely to be any of 1,2,...,. That is, 


P(X =i) =1/n, i=1,...,n. 


©. S 


) 4 


Figure 4.5 A Friendship Graph 
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Letting g() = f() in Proposition 4.1, it follows that E[f(X)], the expected number 
of friends of X, is 


BO) = FOP X=) =) fOn=fn 


i=1 i=1 


Also, letting g(i) = f? (i, it follows from Proposition 4.1 that E if? (X)], the expected 
value of the square of the number of friends of X, is 


Elf (X)] = oP OPX =) =) oP w/n 


i=1 i=1 


Consequently, we see that 


Elf?(X)) _ La PO 5.1) 

Elf(X)] f 
Now suppose that each of the n individuals writes the names of all their friends, 
with each name written on a separate sheet of paper. Thus, an individual with k 
friends will use k separate sheets. Because person i has f(Z) friends, there will be 
f = YL, f@ separate sheets of paper, with each sheet containing one of the n 
names. Now choose one of these sheets at random and let Y denote the name on 
that sheet. Let us compute E[f(Y)], the expected number of friends of the person 
whose name is on the chosen sheet. Now, because person i has f(z) friends, it follows 
that i is the name on f(Z) of the sheets, and thus 7 is the name on the chosen sheet 


with probability po. That is, 


P(Y =i) -”. ial eeareny (1 
Consequently, 
ELfY)] =o fOPY =) => P/F (5.2) 
i=1 i=1 


Thus, from (5.1), we see that 


EUP?(X)] 
ElfY)] Elf] Elf(X)] 
where the inequality follows because the expected value of the square of any random 
variable is always at least as large as the square of its expectation. Thus, E[f(X)] = 
E|f(Y)], which says that the average number of friends that a randomly chosen indi- 
vidual has is less than (or equal to if all the individuals have the same number of 
friends) the average number of friends of a randomly chosen friend. 


Remark The intuitive reason for the friendship paradox is that X is equally likely 
to be any of the m individuals. On the other hand Y is chosen with a probability 
proportional to its number of friends; that is, the more friends an individual has the 
more likely that individual will be Y. Thus, Y is biased towards individuals with a 
large number of friends and so it is not surprising that the average number of friends 
that Y has is larger than the average number of friends that X has. @ 
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Example 
5c 


The following is a further example illustrating the usefulness of the inequality 
that the expected value of a square is at least as large as the square of the expected 
value. 


Suppose there are m days in a year, and that each person is independently born on 
day r with probability p,,r = 1,...,m, )°”" pr = 1. Let Ajj be the event that persons 
iand j are born on the same day. 

(a) Find P(A,3) 

(b) Find P(A; 3/A12) 

(c) Show P(A; 3/A12) = P(A13) 


Solution 


(a) Because the event that 1 and 3 have the same birthday is the union of the m 
mutually exclusive events that they were both born on day 7,r = 1,...,m, we 
have that 


P(Ai3) =) p?. 


(b) Using the definition of conditional probability we obtain that 


P(A, 2A13) 
P(A,2) 

a RePs 
LirPe 


P(A, 3/A12) = 


where the preceding used that A; Aj 3 is the union of the m mutually exclusive 
events that 1,2,3 were all born on day7,r=1,...,m. 

(c) It follows from parts (a) and (b) that P(A; 3|A12) = P(A13) is equivalent to 
> ,P? = 0, p2)*. To prove this inequality, let XY be a random variable that is 
equal to p, with probability p,. That is, PLY = py) = pr, r =1,...,m. Then 


- 
and the result follows because ELX?] = (E[X])?. 


Remark The intuitive reason for why part (c) is true is that if the “popular days” 
are the ones whose probabilities are relatively large, then knowing that 1 and 2 share 
the same birthday makes it more likely (than when we have no information) that the 
birthday of 1 is a popular day and that makes it more likely that 3 will have the same 
birthday as does 1. 


A useful identity is that for any constants a and b, 


Var(aX + b)= a’Var(X) 
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To prove this equality, let 7 = ELX] and note from Corollary 4.1 that E[aX + b] = 
au + b. Therefore, 


Var(aX + b) = E[(aX + b — ap — b)*| 
= Efa’(X — p)’] 
= @WE[(X — p)’] 
— a’Var(X) 


Remarks (a) Analogous to the means being the center of gravity of a distribution 
of mass, the variance represents, in the terminology of mechanics, the moment of 
inertia. 

(b) The square root of the Var(X) is called the standard deviation of X, and we 
denote it by SD(X). That is, 


SD(X) = J Var(X) 


Discrete random variables are often classified according to their probability mass 
functions. In the next few sections, we consider some of the more common types. 


4.6 The Bernoulli and Binomial Random Variables 


Suppose that a trial, or an experiment, whose outcome can be classified as either a 
success or a failure is performed. If we let X = 1 when the outcome is a success and 
X = 0 when it is a failure, then the probability mass function of X is given by 


(6.1) 
p) = P(X =1}=p 


where p, 0 = p = 1, is the probability that the trial is a success. 

A random variable X is said to be a Bernoulli random variable (after the Swiss 
mathematician James Bernoulli) if its probability mass function is given by Equa- 
tions (6.1) for some p € (0,1). 

Suppose now that n independent trials, each of which results in a success with 
probability p or in a failure with probability 1 — p, are to be performed. If X repre- 
sents the number of successes that occur in the v trials, then _X is said to be a binomial 
random variable with parameters (n, p). Thus, a Bernoulli random variable is just a 
binomial random variable with parameters (1, p). 

The probability mass function of a binomial random variable having parameters 
(n, p) is given by 


pi) = (7) a — py? 4=0,1,...,n (6.2) 


The validity of Equation (6.2) may be verified by first noting that the probability of 
any particular sequence of n outcomes containing i successes and n — i failures is, by 
the assumed independence of trials, p'(1 — p)”~'. Equation (6.2) then follows, since 


n é ; ‘ 
there are i different sequences of the n outcomes leading to 7 successes and 


n — i failures. This perhaps can most easily be seen by noting that there are ( : 
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Example 
6a 


Example 
6b 


different choices of the i trials that result in successes. For instance, ifn = 4,i = 2, 


then there are = 6 ways in which the four trials can result in two successes, 


2 
namely, any of the outcomes (s,s, f, f), (s, f, 5, f), (8. f,f,5), ,5,5,f), 6,5, f, 5), and 
(f, f, 5, 5), where the outcome (s, s, f, f) means, for instance, that the first two trials 
are successes and the last two failures. Since each of these outcomes has probability 
p’(1 — p)? of occurring, the desired probability of two successes in the four trials 
is : p’(1 — p)’. 


Note that, by the binomial theorem, the probabilities sum to 1; that is, 


Yieo@=>> (7) a — p)"'=[p+ a -p"=1 
i=0 i=0 


Five fair coins are flipped. If the outcomes are assumed independent, find the prob- 
ability mass function of the number of heads obtained. 


Solution If we let X equal the number of heads (successes) that appear, then X 
is a binomial random variable with parameters (x =5,p= 1). Hence, by Equa- 


n weeae(5)()'G)' 5 
neen-(‘)() ()-8 
neaa-(S)(V()-8 
neaa-()QVG-8 
ne-a-(3)()'()'-8 
PLX = 5} = (5) (5) (3) - 5 o 


It is known that screws produced by a certain company will be defective with prob- 
ability .01, independently of one another. The company sells the screws in packages 
of 10 and offers a money-back guarantee that at most 1 of the 10 screws is defective. 
What proportion of packages sold must the company replace? 


Solution If X is the number of defective screws in a package, then X is a binomial 
random variable with parameters (10, .01). Hence, the probability that a package 
will have to be replaced is 


1 — P{X =0} — Pix =l=1 - (7) (015°¢99)" = i) (.01)!(.99)° 
= .004 


Thus, only .4 percent of the packages will have to be replaced. Oo 


Example 
6c 


Example 
6d 
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The following gambling game, known as the wheel of fortune (or chuck-a-luck), is 
quite popular at many carnivals and gambling casinos: A player bets on one of the 
numbers 1 through 6. Three dice are then rolled, and if the number bet by the player 
appears i times, i = 1,2,3, then the player wins i units; if the number bet by the player 
does not appear on any of the dice, then the player loses 1 unit. Is this game fair to 
the player? (Actually, the game is played by spinning a wheel that comes to rest on 
a slot labeled by three of the numbers 1 through 6, but this variant is mathematically 
equivalent to the dice version.) 


Solution If we assume that the dice are fair and act independently of one another, 
then the number of times that the number bet appears is a binomial random variable 


with parameters (3, t): Hence, letting X denote the player’s winnings in the game, 


nv =-u=(3)()' (3-28 
ne-=(3)()' (= 
rx=~(3)(1) (2-3 
nu=9=(3)(2) (=a 


In order to determine whether or not this is a fair game for the player, let us 
calculate ELX]. From the preceding probabilities, we obtain 
—125 + 75 + 30 + 3 
216 


we have 


-17 
216 


Hence, in the long run, the player will lose 17 units per every 216 games he plays. 


In the next example, we consider the simplest form of the theory of inheritance 
as developed by Gregor Mendel (1822-1884). 


Suppose that a particular trait (such as eye color or left-handedness) of a person is 
classified on the basis of one pair of genes, and suppose also that d represents a domi- 
nant gene and ra recessive gene. Thus, a person with dd genes is purely dominant, 
one with rr is purely recessive, and one with rd is hybrid. The purely dominant and 
the hybrid individuals are alike in appearance. Children receive 1 gene from each 
parent. If, with respect to a particular trait, 2 hybrid parents have a total of 4 children, 
what is the probability that 3 of the 4 children have the outward appearance of the 
dominant gene? 


The preceding Figure 4.6a and b shows what can happen when hybrid yellow (dom- 
inant) and green (recessive) seeds are crossed. 


Solution If we assume that each child is equally likely to inherit either of 2 genes 
from each parent, the probabilities that the child of 2 hybrid parents will have dd, 
rr, and rd pairs of genes are, respectively, i b and 5 Hence, since an offspring will 
have the outward appearance of the dominant gene if its gene pair is either dd or rd, 
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6e 


Hybrid Hybrid 


Pure yellow Pure green 


Yellow hybrid Pure yellow Hybrid Hybrid Pure green 
(a) (b) 


Figure 4.6 (a) Crossing pure yellow seeds with pure green seeds; (b) Crossing hybrid 
first-generation seeds. 


it follows that the number of such children is binomially distributed with parameters 
(4, 3). Thus, the desired probability is 


(:)@) @)'-2 . 


Consider a jury trial in which it takes 8 of the 12 jurors to convict the defendant; 
that is, in order for the defendant to be convicted, at least 8 of the jurors must vote 
him guilty. If we assume that jurors act independently and that whether or not the 
defendant is guilty, each makes the right decision with probability 0, what is the 
probability that the jury renders a correct decision? 


Solution The problem, as stated, is incapable of solution, for there is not yet enough 
information. For instance, if the defendant is innocent, the probability of the jury 
rendering a correct decision is 


12 
- ( gia — @)!2-i 


whereas, if he is guilty, the probability of a correct decision is 


12 
- ( . gia — oy! 


i=8 


Therefore, if a represents the probability that the defendant is guilty, then, by condi- 
tioning on whether or not he is guilty, we obtain the probability that the jury renders 
a correct decision: 


12 


12 
“(Je — oP 4+ - od (? Jo — oi 7 
i=), 


i=8 


Example 
6f 
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A communication system consists of n components, each of which will, indepen- 
dently, function with probability p. The total system will be able to operate effec- 
tively if at least one-half of its components function. 


(a) For what values of p is a 5-component system more likely to operate effectively 
than a 3-component system? 


(b) In general, when is a (2k + 1)-component system better than a (2k — 1)- 
component system? 


Solution (a) Because the number of functioning components is a binomial random 
variable with parameters (n, p), it follows that the probability that a 5-component 
system will be effective is 


(3) = py + lx —p)+p 


whereas the corresponding probability for a 3-component system is 


(3) —p)+p 


Hence, the 5-component system is better if 
1op*(1 — py? + Spt. — p) + p? > 3p°( — p) +P 
which reduces to 
36 = Top = 1) = 0 


or 
1 
> = 
P95 


(b) In general, a system with 2k + 1 components will be better than one with 


2k — 1 components if (and only if) p > 5 To prove this, consider a system of 2k + 1 


components and let X denote the number of the first 2k — 1 that function. Then 


P2441 (effective) = P(X =k + 1} + P(X =k — (1 — p)’) 
+ PiX =k — 1p’ 


which follows because the (2k + 1)-component system will be effective if either 
(i) X=k+1,; 
(ii) X = k and at least one of the remaining 2 components function; or 

(iii) X =k — 1 and both of the next 2 components function. 


Since 


Po,_1 (effective) = PLX = k} 
= P{X =k} + Pix =k + 1} 
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we obtain 


Pox41 (effective) — P2x_1 (effective) 
= P{X=k — 1jp? — 0 — pyr PiX=h 


= ts 1) =p = (= om; "oka cet 


2k — 1 . 2k — 1 2k — 1 
-( - Joa oAtn ~ 0 ~ ptsince (48 =) =( k 


1 
PMS PPS a 


4.6.1 Properties of Binomial Random Variables 


We will now examine the properties of a binomial random variable with parameters 
nand p. To begin, let us compute its expected value and variance. To begin, note that 


EX) =) 0H @ =p 
i=0 

=yoK* @ =p)" 
i=1 


()() 


E[X*] = np yi ( _ : ro — py 


L 
i=1 


Using the identity 


gives 


nl = _ by letting 
=np og + ye! 4 ; ala — py lt jsi-l 
; J 
j=0 
= npE((Y + 1)*"'] 
where Y is a binomial random variable with parameters n — 1, p. Setting k = 1 in 
the preceding equation yields 
E[X] = np 
That is, the expected number of successes that occur in n independent trials when 
each is a success with probability p is equal to np. Setting k = 2 in the preced- 
ing equation and using the preceding formula for the expected value of a binomial 
random variable yields 
E[X?] = npE[Y + 1] 
=npl(n — l)p + I 


Proposition 
6.1 


Example 
6g 
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Since ELX] = np, we obtain 
Var(X) = ELX?] — (E[X])? 

=np[(n — 1p + 1] — (np? 

=np(1 — p) 


Summing up, we have shown the following: 
If X is a binomial random variable with parameters n and p, then 


E[X] = np 
Var(X) = np(1 — p) 


The following proposition details how the binomial probability mass function 
first increases and then decreases. 


If X is a binomial random variable with parameters (n, p), where 0 < p < 1, then 
as k goes from 0 to n, P(X = k} first increases monotonically and then decreases 
monotonically, reaching its largest value when k is the largest integer less than or 
equal to (n + 1)p. 


Proof We prove the proposition by considering P(X = k}/P{X = k — 1} and deter- 
mining for what values of it is greater or less than 1. Now, 


n! k n—k 
PIX =kh @ — bie? ¢ ~ P) 
P{X=k—1} n! 
a —k+ Dik — I! 


phd = a 


t= hae 1p 
k(1 — p) 
Hence, P(X =k} = P{X =k — 1}if and only if 
(n-k+1)p=kd — p) 


or, equivalently, if and only if 
ks(n+ 1)p 


and the proposition is proved. 


As an illustration of Proposition 6.1, consider Figure 4.7, the graph of the prob- 
ability mass function of a binomial random variable with parameters (10, 5): 


In a US. presidential election, the candidate who gains the maximum number of 
votes in a state is awarded the total number of electoral college votes allocated to 
that state. The number of electoral college votes of a given state is roughly propor- 
tional to the population of that state—that is, a state with population n has roughly 
nc electoral votes. (Actually, it is closer to nc + 2, as a state is given an electoral 
vote for each member it has in the House of Representatives, with the number of 
such representatives being roughly proportional to the population of the state, and 
one electoral college vote for each of its two senators.) Let us determine the average 
power of a citizen in a state of size n in a close presidential election, where, by aver- 
age power in a close election, we mean that a voter in a state of size n = 2k + 1 will be 
decisive if the other n — 1 voters split their votes evenly between the two candidates. 
(We are assuming here that 7 is odd, but the case where n is even is quite similar.) 
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Figure 4.7 Graph of p(k) = ('?) (5) 


Because the election is close, we shall suppose that each of the other n — 1 = 2k 
voters acts independently and is equally likely to vote for either candidate. Hence, 
the probability that a voter in a state of size n = 2k + 1 will make a difference to the 
outcome is the same as the probability that 2k tosses of a fair coin land heads and 
tails an equal number of times. That is, 


P{voter in state of size 2k + 1 makes a difference} 


7 2k 1 k 1 k 
=(% )() (3) 
_ (2k)! 
~ kik122k 


To approximate the preceding equality, we make use of Stirling’s approximation, 
which says that for k large, 


kl w ke 12 6-k on 


where we say that ax ~ bx when the ratio ax/bx approaches 1 as k approaches oo. 
Hence, it follows that 


P{voter in state of size 2k + 1 makes a difference} 
(2k)2k 41/2 e—2k On 1 


© eke T 2k (In 22k eae 


Because such a voter (if he or she makes a difference) will affect nc electoral votes, 
the expected number of electoral votes a voter in a state of size n will affect —or the 
voter’s average power—is given by 


average power = ncP{makes a difference} 


Example 
6h 
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Thus, the average power of a voter in a state of size n is proportional to the square 
root of n, showing that in presidential elections, voters in large states have more 
power than do those in smaller states. a 


4.6.2 Computing the Binomial Distribution Function 


Suppose that X is binomial with parameters (n, p). The key to computing its distri- 
bution function 


i 

n k n—-k i 

parsa=3-(2) pa - i=0,1,...,n 
k=0 

is to utilize the following relationship between P{X¥ = k + 1} and P{X = k}, which 

was established in the proof of Proposition 6.1: 


Px=ke ie rr = kj (6.3) 


Let X be a binomial random variable with parameters 1 = 6, p = .4. Then, starting 
with P{X = 0} = (.6)° and recursively employing Equation (6.3), we obtain 


P{X = 0} = (.6)° = .0467 


Ax =i= 2° PLX = 0} © .1866 


Ax= = #2 P(X = 1) 8 3110 


P(X =3}= * SPX = 2} = 2765 


P(X =4} = oo PIX =3) * 1382 


Ax aS = +E P(X = 4} ~ .0369 


P(X =6}= + PIX = 5) ® 0041 a 


A computer program that utilizes the recursion (6.3) to compute the binomial 
distribution function is easily written. To compute P{X = i}, the program should 
first compute P{X = i} and then use the recursion to successively compute P(X = 
i — 1}, P{X¥ =i — 2}, and so on. 


Historical note 

Independent trials having a common probability of success p were first stud- 
ied by the Swiss mathematician Jacques Bernoulli (1654-1705). In his book Ars 
Conjectandi (The Art of Conjecturing), published by his nephew Nicholas eight 
years after his death in 1713, Bernoulli showed that if the number of such trials 
were large, then the proportion of them that were successes would be close to p 
with a probability near 1. 
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Jacques Bernoulli was from the first generation of the most famous mathe- 
matical family of all time. Altogether, there were between 8 and 12 Bernoullis, 
spread over three generations, who made fundamental contributions to proba- 
bility, statistics, and mathematics. One difficulty in knowing their exact number 
is the fact that several had the same name. (For example, two of the sons of 
Jacques’s brother Jean were named Jacques and Jean.) Another difficulty is that 
several of the Bernoullis were known by different names in different places. 
Our Jacques (sometimes written Jaques) was, for instance, also known as Jakob 
(sometimes written Jacob) and as James Bernoulli. But whatever their num- 
ber, their influence and output were prodigious. Like the Bachs of music, the 
Bernoullis of mathematics were a family for the ages! 


Example If X is a binomial random variable with parameters n = 100 and p = .75, find 
6i P{X = 70} and P{X = 70}. 
Solution A binomial calculator can be used to obtain the following solutions: 


= Binomial Distribution »< 


Enter Value 


Enter Value 


Enter Value 


Probability (Number of Successes = i .04575381 


Probability (Number of Successes < = i) = .14954105 


Figure 4.8 


4.7 The Poisson Random Variable 


A random variable X that takes on one of the values 0,1,2,... is said to be a Poisson 
random variable with parameter A if, forsome 27 > 0, 
i 


pO = P(X =) =e i=0,1,2,... (71) 
L. 


Equation (71) defines a probability mass function, since 


oo 8 i 

A So 
y p@=e y a= e=1 
i=0 i=0 


The Poisson probability distribution was introduced by Siméon Denis Poisson in a 
book he wrote regarding the application of probability theory to lawsuits, criminal 
trials, and the like. This book, published in 1837 was entitled Recherches sur la prob- 
abilité des jugements en matiére criminelle et en matiére civile (Investigations into the 
Probability of Verdicts in Criminal and Civil Matters). 
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The Poisson random variable has a tremendous range of applications in diverse 
areas because it may be used as an approximation for a binomial random variable 
with parameters (n, p) when n is large and p is small enough so that np is of moderate 
size. To see this, suppose that X is a binomial random variable with parameters (n, p), 
and let A = np. Then 


Ge = a? a - py 


on AV (, ay 
=a (F) ( ~) 


_ an —1)--@ -—i + DN — Any” 
— ni i! ( — A/n)i 


Px Sa= 


Now, for n large and A moderate, 


(1 - =) sg. ERE ES Pa sag (1 - =) ~ 1 


n n! 
Hence, for n large and A moderate, 


ri 
P(X = i} © go 
LU 


In other words, if n independent trials, each of which results in a success with 
probability p, are performed, then when n is large and p is small enough to make 
np moderate, the number of successes occurring is approximately a Poisson random 
variable with parameter 4 = np. This value A (which will later be shown to equal the 
expected number of successes) will usually be determined empirically. 

Some examples of random variables that generally obey the Poisson probability 
law [that is, they obey Equation (71)] are as follows: 


. The number of misprints on a page (or a group of pages) of a book 

. The number of people in a community who survive to age 100 

The number of wrong telephone numbers that are dialed in a day 

The number of packages of dog biscuits sold in a particular store each day 

. The number of customers entering a post office on a given day 

. The number of vacancies occurring during a year in the federal judicial system 


NAN F WN 


. The number of a-particles discharged in a fixed period of time from some 
radioactive material 


Each of the preceding and numerous other random variables are approximately 
Poisson for the same reason—namely, because of the Poisson approximation to the 
binomial. For instance, we can suppose that there is a small probability p that each 
letter typed on a page will be misprinted. Hence, the number of misprints on a page 
will be approximately Poisson with A = np, where n is the number of letters on a 
page. Similarly, we can suppose that each person in a community has some small 
probability of reaching age 100. Also, each person entering a store may be thought 
of as having some small probability of buying a package of dog biscuits, and so on. 
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Example 
Ta 


Example 
7b 


Example 
Tc 


Suppose that the number of typographical errors on a single page of this book has a 
Poisson distribution with parameter 1 = 5 Calculate the probability that there is at 
least one error on this page. 


Solution Letting X denote the number of errors on this page, we have 
P(X = Y=Hl1 — PX =O =1 - ec)? & 393 | 


Suppose that the probability that an item produced by a certain machine will be 
defective is .1. Find the probability that a sample of 10 items will contain at most 1 
defective item. 


Solution The desired probability is " (.1)9¢.9)9 + 7 (.1)!(.9)? = .7361, 


whereas the Poisson approximation yields the value e~! + e~! = .7358. 


Consider an experiment that consists of counting the number of a particles given 
off in a 1-second interval by 1 gram of radioactive material. If we know from past 
experience that on the average, 3.2 such a particles are given off, what is a good 
approximation to the probability that no more than 2 a particles will appear? 


Solution If we think of the gram of radioactive material as consisting of a large 
number n of atoms, each of which has probability of 3.2/n of disintegrating and send- 
ing off an w particle during the second considered, then we see that to a very close 
approximation, the number of a@ particles given off will be a Poisson random variable 
with parameter 4 = 3.2. Hence, the desired probability is 


3.2) 
PIX <2) = 032 + 3.20732 4 ee 


= .3799 O 


Before computing the expected value and variance of the Poisson random vari- 
able with parameter i, recall that this random variable approximates a binomial 
random variable with parameters n and p when vis large, p is small, and 4 = np. 
Since such a binomial random variable has expected value np = i and variance 
np(1 — p) =A(1 — p) & A (since p is small), it would seem that both the expected 
value and the variance of a Poisson random variable would equal its parameter A. 
We now verify this result: 
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Thus, the expected value of a Poisson random variable X is indeed equal to its 
parameter 4. To determine its variance, we first compute ELX?]: 


_ wg + Der _ by letting 


where the final equality follows because the first sum is the expected value of a 
Poisson random variable with parameter 4 and the second is the sum of the prob- 
abilities of this random variable. Therefore, since we have shown that ELX] = A, we 
obtain 


Var(X) = ELX?] — (E[X])* 
=K 


Hence, the expected value and variance of a Poisson random variable are both 
equal to its parameter A. 

We have shown that the Poisson distribution with parameter np is a very good 
approximation to the distribution of the number of successes in n independent trials 
when each trial has probability p of being a success, provided that n is large and p 
small. In fact, it remains a good approximation even when the trials are not inde- 
pendent, provided that their dependence is weak. For instance, recall the matching 
problem (Example 5m of Chapter 2) in which n men randomly select hats from a set 
consisting of one hat from each person. From the point of view of the number of 
men who select their own hat, we may regard the random selection as the result of n 
trials where we say that trial i is a success if person i selects his own hat, i = 1,...,7. 
Defining the events F;,i = 1,...,n, by 


E; = {trial iis a success} 


it is easy to see that 


P{E}} = . and P{E;|Ej} = ane jJ#i 
n n—-1 

Thus, we see that although the events £;,i = 1,...,n are not independent, their 
dependence, for large n, appears to be weak. Because of this, it seems reasonable to 
expect that the number of successes will approximately have a Poisson distribution 
with parameter n X 1/n = 1 and indeed this is verified in Example 5m of Chapter 2. 

For a second illustration of the strength of the Poisson approximation when the 
trials are weakly dependent, let us consider again the birthday problem presented in 
Example 5i of Chapter 2. In this example, we suppose that each of n people is equally 
likely to have any of the 365 days of the year as his or her birthday, and the problem 
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is to determine the probability that a set of m independent people all have different 
birthdays. A combinatorial argument was used to determine this probability, which 
was shown to be less than 5 when n = 23. 

We can approximate the preceding probability by using the Poisson approxima- 


tion as follows: Imagine that we have a trial for each of the is pairs of individuals 


i and j,i # j, and say that trial i, j is a success if persons i and j have the same 
birthday. If we let Ej; denote the event that trial i, j is a success, then, whereas the 


n es ; : . 
7 events Ejj,1 Si < j =n, are not independent (see Theoretical Exercise 4.21), 
their dependence appears to be rather weak. (Indeed, these events are even pair- 
wise independent, in that any 2 of the events Ej; and Ex; are independent —again, see 
Theoretical Exercise 4.21). Since P(Ejj) = 1/365, it is reasonable to suppose that the 
number of successes should approximately have a Poisson distribution with mean 


(5) / 365 = n(n — 1)/730. Therefore, 


P{no 2 people have the same birthday} = P{0 successes} 


—n(n — 1) 
=o 5 


To determine the smallest integer n for which this probability is less than 7 note that 


[— - >| 1 
exp < 


730 


is equivalent to 


n(n — 1) a 
x» | B0 }=2 


Taking logarithms of both sides, we obtain 


n(n — 1) = 730log2 
~ 505.997 


which yields the solution n = 23, in agreement with the result of Example 5i of 
Chapter 2. 

Suppose now that we wanted the probability that among the n people, no 3 of 
them have their birthday on the same day. Whereas this now becomes a difficult 
combinatorial problem, it is a simple matter to obtain a good approximation. To 
n 
3 
J < k <n, and call the i, j, k trial a success if persons i, j, and k all have their birthday 
on the same day. As before, we can then conclude that the number of successes is 
approximately a Poisson random variable with parameter 


2 
n is . n 1 
( 3 P{i,j,k have the same birthday} = ( 3 (<5) 


n(n — 1)(n — 2) 
6 X (365)2 


begin, imagine that we have a trial for each of the triplets 7, j,k, where 1 <i < 


Example 
7d 


A First Course in Probability 163 


Hence, 


= =f, —2 
P{no 3 have the same birthday} ~ exp | dal an 


This probability will be less than 5 when n is such that 
n(n — 1)(n — 2) = 799350 log 2 © 554067.1 


which is equivalent to n = 84. Thus, the approximate probability that at least 3 people 
in a group of size 84 or larger will have the same birthday exceeds i. 

For the number of events to occur to approximately have a Poisson distribution, 
it is not essential that all the events have the same probability of occurrence, but 
only that all of these probabilities be small. The following is referred to as the Pois- 
son paradigm. 


Poisson Paradigm. Consider 1 events, with p; equal to the probability that 
event i occurs, i = 1,...,”. If all the p; are “small” and the trials are either 
independent or at most “weakly dependent,” then the number of these events 
that occur approximately has a Poisson distribution with mean }“"_, pi. 


Our next example not only makes use of the Poisson paradigm, but also illus- 
trates a variety of the techniques we have studied so far. 


Length of the longest run 


A coin is flipped 7 times. Assuming that the flips are independent, with each one 
coming up heads with probability p, what is the probability that there is a string of k 
consecutive heads? 


Solution We will first use the Poisson paradigm to approximate this probability. 
Now, if fori =1,...,1 — k + 1, we let H; denote the event that flips i,i + 1,...,i+ 
k — 1 allland on heads, then the desired probability is that at least one of the events 
H; occur. Because H; is the event that starting with flip i, the next k flips all land 
on heads, it follows that P(H;) = p*. Thus, when pk is small, we might think that 
the number of the H; that occur should have an approximate Poisson distribution. 
However, such is not the case, because, although the events all have small proba- 
bilities, some of their dependencies are too great for the Poisson distribution to be 
a good approximation. For instance, because the conditional probability that flips 
2,...,k + 1 are all heads given that flips 1,..., are all heads is equal to the proba- 
bility that flip k + 1 is a head, it follows that 


P(A2|H1) = p 


which is far greater than the unconditional probability of H2. 

The trick that enables us to use a Poisson approximation is to note that there 
will be a string of k consecutive heads either if there is such a string that is imme- 
diately followed by a tail or if the final k flips all land on heads. Consequently, for 
i=1,...,n — k, let E; be the event that flips i,...,i + k — 1 are all heads and flip 
i+ kisa tail; also, let E,_x~41 be the event that flipsn — k + 1,...,n are all heads. 
Note that 
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P(E) =p*(1 — p), isn—k 
P(En—K41) = p* 


Thus, when p* is small, each of the events E; has a small probability of occurring. 
Moreover, for i # j, if the events F; and E; refer to nonoverlapping sequences of flips, 
then P(E;|E;) = P(E;); if they refer to overlapping sequences, then P(E£;|Ej) = 0. 
Hence, in both cases, the conditional probabilities are close to the unconditional 
ones, indicating that N, the number of the events £; that occur, should have an 
approximate Poisson distribution with mean 


n—k+1 
E(IN]= >> P(E) =(n — kypk( — p) + p* 


=1 


Because there will not be a run of k heads if (and only if) N = 0, the preceding gives 
Pcno head strings of length k) = P(N = 0) © exp{-—(” — kypka —p)- p*} 


If we let L,, denote the largest number of consecutive heads in the n flips, then, 
because L,, will be less than k if (and only if) there are no head strings of length k, 
the preceding equation can be written as 


P{Ln < k} * exp{—(@ — kp*d — p) — p*} 


Now, let us suppose that the coin being flipped is fair; that is, suppose that p = 1/2. 
Then the preceding gives 
n 
ioe a) iT 5 


kad 
where the final approximation supposes that e2*+! = 1 (that is, that eel ~ 0). Let 
j = log, n, and assume that j is an integer. For k =j + i, 


2 


—k+2 
P{Ln < k} ® exp |=" Ser} 


Qk+1 


n n 1 


Qk+1 ~~ 2jQi+1 ~~ i+] 


Consequently, 


P{Ln <j + i} © exp{—/2)*} 
which implies that 


PiLn =jft+=PUln <jtitY—-Plln <j+ 9 
~ exp{—(1/2)'*?} — exp{—(1/2)'*}} 


For instance, 
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PUL, <7 = 3) ee * = 0183 
P{Ln =j — 3) ee? — e 4 & .1170 
P{Ly =j — 2} eT! — e? & .2325 
Pilg=i =e eo = om 2387 
P{Ln = j} © e'/4 — eV? & 1723 
Pilg =f + 1p ee? — e™ w 1037 
P{ln=j +2) Re eV" = 68 & 0569 
Pilagai+ 3 ee =e @ 0098 
P{L, =j + 44 %1 - e V2 & 0308 
Thus, we observe the rather interesting fact that no matter how large n is, the length 
of the longest run of heads in a sequence of v flips of a fair coin will be within 2 of 
log,(m) — 1 with a probability approximately equal to .86. 
We now derive an exact expression for the probability that there is a string of 
k consecutive heads when a coin that lands on heads with probability p is flipped 


n times. With the events £;,i = 1,...,n — k + 1, as defined earlier, and with L,, 
denoting, as before, the length of the longest run of heads, 


P(Ln = k) = P(there is a string of k consecutive heads) = PUM ED 


The inclusion—exclusion identity for the probability of a union can be written as 


n—k+1 
Pe = 7 (=1y" > P(Ei, oat Ei,) 
r= iy <-<i, 


Let S; denote the set of flip numbers to which the event £; refers. (So, for instance, 
S; = {1,...,4 + 1}.) Now, consider one of the r-way intersection probabilities that 
does not include the event E,,_ 441. That is, consider P(E;, --- E;,) where iy < --- < 
i, < n — k + 1. On the one hand, if there is any overlap in the sets Sj,,..., Si, 
then this probability is 0. On the other hand, if there is no overlap, then the events 
Ej,,..., Ej, are independent. Therefore, 

eo We if there is any overlap in S;,,..., Si, 
BE Ey pk — p)’, _ if there is no overlap 

We must now determine the number of different choices of ij <---<i,<n—k+1 
for which there is no overlap in the sets S;,,...,5;,. To do so, note first that each 
of the Si, j = 1,...,7, refer to k + 1 flips, so, without any overlap, they together 
refer to r(k + 1) flips. Now consider any permutation of r identical letters a and 
of n — r(k + 1) identical letters b. Interpret the number of b’s before the first a 
as the number of flips before $;,, the number of b’s between the first and second 
a as the number of flips between S;, and S;,, and so on, with the number of b’s 
after the final a representing the number of flips after S;,. Because there are amas 
permutations of r letters a and of n — r(k + 1) letters b, with every such permuta- 
tion corresponding (in a one-to-one fashion) to a different nonoverlapping choice, it 
follows that 


— rk 
> P(Ej, --- Ej,) = (" , : ea = py 


ly <:++<ip<n—k+4+1 
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We must now consider r-way intersection probabilities of the form 
P(Ej, +++ Ei, ,En—-k+1), 


where ij) < --- < ip-y <n — k + 1. Now, this probability will equal 0 if there 
is any overlap in S;,,...,5$;,_,,Sn—x; if there is no overlap, then the events of the 
intersection will be independent, so 


PCE;, «++ Ei, Ene) = [p*1 — pt p* =p” — py 


By a similar argument as before, the number of nonoverlapping sets S;,,...,Si_,, 
Sp— Will equal the number of permutations of r — 1 letters a (one for each of the 
sets S;,,...,58;_,) andofn — (r — 1)(k + 1) — k=n — rk — (r — 1) letters b (one 


for each of the trials that are not part of any of S;,,...,5Sj,_,,Sn—x41). Since there are 
C2) permutations of r — 1 letters a and ofn — rk — (r — 1) letters b, we have 


n — rk = 
> P(Ej, +++ Ei, ; En-k+1) = (" 24 ane — py 


ly <...<i,_4<n—k+4+1 


Putting it all together yields the exact expression, namely, 


i 1| (1 — rk 1 (n — rk k 
P(Ln 2k) = Yep ( ; y+sG20) p’(1 — py 
r=1 


where we utilize the convention that (”) =Oifm <j. 

From a computational point of view, a more efficient method for computing the 
desired probability than the use of the preceding identity is to derive a set of recur- 
sive equations. To do so, let A, be the event that there is a string of & consecutive 
heads in a sequence of n flips, and let P, = P(An). We will derive a set of recursive 
equations for P, by conditioning on when the first tail appears. For j = 1,...,k, let 
F; be the event that the first tail appears on flip j, and let H be the event that the 
first k flips are all heads. Because the events F,..., F,, H are mutually exclusive and 
exhaustive (that is, exactly one of these events must occur), we have 


k 
P(An) = )) P(An|F) P(E) + P(An|H)P(H) 
j=l 
Now, given that the first tail appears on flip j, where j < k, it follows that those j 
flips are wasted as far as obtaining a string of k heads in a row; thus, the conditional 


probability of this event is the probability that such a string will occur among the 
remaining n — j flips. Therefore, 


P(An|Fj) = Pu-j 
Because P(A,|H) = 1, the preceding equation gives 


Pn = P(An) 
k 
= ) 5 Pn; PF) + P(A) 
j=l 
k 
= >> Prypl td — p) + p* 
j=l 
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Starting with P; = 0,7 < k,and Py = py. we can use the latter formula to recur- 
sively compute P41, Px42, and so on, up to P,. For instance, suppose we want the 
probability that there is a run of 2 consecutive heads when a fair coin is flipped 4 
times. Then, with k = 2, we have P; = 0, P2 = (1/2). Because, when p = 1/2, the 
recursion becomes 


k 
Py= >" Pr; (/2¥ + 0/2)" 
j=l 
we obtain 
P3 = P2(1/2) + Py(1/2)* + (1/2)* =3/8 
and 


Pa = P3(1/2) + P2(1/2)? + (1/2)? = 1/2 


which is clearly true because there are 8 outcomes that result in a string of 2 consecu- 
tive heads: hhhh, hhht, hhth, hthh, thhh, hhtt, thht, and tthh. Each of these outcomes 
occurs with probability 1/16. | 


Another use of the Poisson probability distribution arises in situations where 
“events” occur at certain points in time. One example is to designate the occurrence 
of an earthquake as an event; another possibility would be for events to correspond 
to people entering a particular establishment (bank, post office, gas station, and so 
on); and a third possibility is for an event to occur whenever a war starts. Let us 
suppose that events are indeed occurring at certain (random) points of time, and let 
us assume that for some positive constant A, the following assumptions hold true: 


1. The probability that exactly 1 event occurs in a given interval of length h is 
equal to Ah + o(h), where o(h) stands for any function f(/) for which 
jim f(h)/h = 0. [For instance, f(h) = h? is o(h), whereas f(h) = his not.] 

> 


2. The probability that 2 or more events occur in an interval of length h is equal 
to o(h). 

3. For any integers n, j1, j2,..., jn and any set of n nonoverlapping intervals, if 
we define F; to be the event that exactly j; of the events under consideration 
occur in the ith of these intervals, then events F, E2,..., E, are independent. 


Loosely put, assumptions 1 and 2 state that for small values of h, the probability 
that exactly 1 event occurs in an interval of size h equals Ah plus something that is 
small compared with h, whereas the probability that 2 or more events occur is small 
compared with h. Assumption 3 states that whatever occurs in one interval has no 
(probability) effect on what will occur in other, nonoverlapping intervals. 

We now show that under assumptions 1, 2, and 3, the number of events occurring 
in any interval of length ¢ is a Poisson random variable with parameter At. To be 
precise, let us call the interval [0, f] and denote the number of events occurring in 
that interval by N(f). To obtain an expression for P{N(‘) = k}, we start by breaking 
the interval [0, ft] into m nonoverlapping subintervals, each of length #/n (Figure 4.9). 


| | 

_ nt 
t 2t  3t t=— 
_ = = t 

n n n (n _ 1); n 


Figure 4.9 
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Now, 


P{N(@® = k} = P{k of the n subintervals contain exactly 1 event 
and the other nm — k contain 0 events} (7.2) 
+ P{N(t) = k and at least 1 subinterval contains 


2 or more events} 


The preceding equation holds because the event on the left side of Equation (72), 
that is, {N(¢) = k}, is clearly equal to the union of the two mutually exclusive events 
on the right side of the equation. Letting A and B denote the two mutually exclusive 
events on the right side of Equation (7.2), we have 


P(B) S Pat least one subinterval contains 2 or more events} 


n 
=f |_Jtith subinterval contains 2 or more events} 
i=1 
n 


< > P{ith subinterval contains 2 or more events} Pe Bools ? 
am inequality 
= t 
= O (<) by assumption 2 
i=1 - 
(7) 
=no|-— 
n 
o(t/n) 
~ | i] 
Now, for any ¢,t/n—0 as noo, so o(t/n)/(t/n)—>0 as n—co, by the definition of 
o(h). Hence, 
P(B)>0 as n->oo (73) 


Moreover, since assumptions 1 and 2 imply that" 


P{0 events occur in an interval of length h} 
=1 — [Ah + o(h) + o(h)] =1 — Ah — oth) 


we see from the independence assumption (number 3) that 


P(A) = P{k of the subintervals contain exactly 1 event and the other 


n — k contain 0 events} 
k n—-k 
(z)|* (<)| (=) (<)| 
= —+o0{- 1 O 
k n n n n 
However, since 
E (<)| ° t/n) 
n|—+o|{- =aAt+t 
n n t/n 


The sum of two functions, both of which are o(h), is also o(h). This is so because if lim,—+of(h)/h = 
lim, —>o g()/h = 0, then lim,—> o[f(h) + g()]/h = 0. 


|>* as n—-oo 
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it follows, by the same argument that verified the Poisson approximation to the bino- 
mial, that 


an 
PA) as Noo (74) 
Thus, from Equations (72), (73), and (74), by letting noo, we obtain 
ow 
PIN) = =e k=0,1,... (75) 


Hence, if assumptions 1, 2, and 3 are satisfied, then the number of events occur- 
ring in any fixed interval of length tis a Poisson random variable with mean At, and 
we say that the events occur in accordance with a Poisson process having rate 2. The 
value 4, which can be shown to equal the rate per unit time at which events occur, is 
a constant that must be empirically determined. 

The preceding discussion explains why a Poisson random variable is usually a 
good approximation for such diverse phenomena as the following: 


1. The number of earthquakes occurring during some fixed time span 

2. The number of wars per year 

3. The number of electrons emitted from a heated cathode during a fixed time 
period 

4. The number of deaths, in a given period of time, of the policyholders of a life 
insurance company 


Suppose that earthquakes occur in the western portion of the United States in accor- 
dance with assumptions 1, 2, and 3, with 4 = 2 and with 1 week as the unit of time. 
(That is, earthquakes occur in accordance with the three assumptions at a rate of 2 
per week.) 
(a) Find the probability that at least 3 earthquakes occur during the next 2 weeks. 
(b) Find the probability distribution of the time, starting from now, until the next 
earthquake. 
Solution (a) From Equation (75), we have 


P{N(2) = 3} =1—P{NQ) = 0} — P(N) =1} — P{N(2) =2} 
42 


wf = (32° 


(b) Let X denote the amount of time (in weeks) until the next earthquake. 
Because X will be greater than ¢ if and only if no events occur within the next ¢ 
units of time, we have, from Equation (75), 


P{X > th = P{N(t) =0} =e! 
so the probability distribution function F of the random variable X is given by 
F(t) = P(X = t}=1- P{X > H=1-—e" 
=1-e% a 
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4.7.1 Computing the Poisson Distribution Function 


If X is Poisson with parameter A, then 


PIX=it+ VY e*tty/G+ pl a 
PX =i} ent /i! pate 


(16) 


Starting with P{Y = 0} = e~“, we can use (76) to compute successively 
P(X =1} =AP{X =0} 
Xr 
PIX =2}= 5x = 1} 


i, 
PLX =i 1} = ——_ Pix =i 
{ i+ 1} ia i} 


We can use a module to compute the Poisson probabilities for Equation (76). 


Example (a) Determine P{X¥ = 90} when_X is Poisson with mean 100. 
7f (b) Determine P{Y = 1075} when Y is Poisson with mean 1000. 


Solution Using the Poisson calculator of StatCrunch yields the solutions: 


(a) P{X = 90} = .17138 
(b) P{Y = 1075} = .99095 a 


4.8 Other Discrete Probability Distributions 
4.8.1 The Geometric Random Variable 


Suppose that independent trials, each having a probability p,0 < p < 1, of beinga 
success, are performed until a success occurs. If we let X equal the number of trials 
required, then 


P{iX=n}=(1 — p)™'p n=1,2,... (8.1) 


Equation (8.1) follows because, in order for X to equal n, it is necessary and suffi- 
cient that the first n — 1 trials are failures and the nth trial is a success. Equation (8.1) 
then follows, since the outcomes of the successive trials are assumed to be indepen- 
dent. 

Since 


Px an=py 0 - pyrta 2 at 
LPI n}=p)> 0 - p) ao 


nel 


it follows that with probability 1, a success will eventually occur. Any random vari- 
able X whose probability mass function is given by Equation (8.1) is said to be a 
geometric random variable with parameter p. 
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Example An urn contains N white and M black balls. Balls are randomly selected, one at a 
8a time, until a black one is obtained. If we assume that each ball selected is replaced 
before the next one is drawn, what is the probability that 


(a) exactly n draws are needed? 
(b) at least kK draws are needed? 


Solution If we let X denote the number of draws needed to select a black ball, then 
X satisfies Equation (8.1) with p = M/(M + N). Hence, 


(a) 


n—-1 n—1 
P{X =n}= ( N ) me = uN 


M+N M+N (M+N)? 


(b) 


PX =k}= 


wendy (wen) 
=(7oR (a a) / Pal 


N k-1 
= (7 + x) 


Of course, part (b) could have been obtained directly, since the probability that at 
least k trials are necessary to obtain a success is equal to the probability that the first 
k — 1 trials are all failures. That is, for a geometric random variable, 


PiX=h=( - py! o 
Example Find the expected value of a geometric random variable. 


8b Solution With g =1 — p, we have 


ea . 

= Lia 

25% —1+4+1)q''p 
i=1 


[o@) 
=oG - Dap + ar 'p 
=i i=1 


CO 
=) igp +1 
j=0 


Hence, 
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yielding the result 


In other words, if independent trials having a common probability p of being success- 
ful are performed until the first success occurs, then the expected number of required 
trials equals 1/p. For instance, the expected number of rolls of a fair die that it takes 
to obtain the value 1 is 6. a 


Find the variance of a geometric random variable. 


Solution To determine Var(X), let us first compute ELX?]. With g = 1 — p, we have 
ca . 
E[X?] = > Pq’ ‘py 

i=1 
me . 

=)\@-1+4+1)q"'p 
i=1 

= od - Dah "p + S026 - Dap + Dog lp 
i=1 i=1 i=1 

=) jfqp +2) idp + 1 
j=0 j=l 

= qE[X’] + 2qE[X] + 1 


Using E[X] = 1/p, the equation for E[X7] yields 


Hence, 


giving the result 


4.8.2 The Negative Binomial Random Variable 


Suppose that independent trials, each having probability p,0 < p < 1, of being a 
success are performed until a total of r successes is accumulated. If we let X equal 
the number of trials required, then 


pacam= ("21 Joa - wh n=r,r+1,... (8.2) 


Equation (8.2) follows because, in order for the rth success to occur at the nth trial, 
there must be r—1 successes in the first n — 1 trials and the nth trial must be a success. 
The probability of the first event is 
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-—1 
(: 4 jena _ py 


and the probability of the second is p; thus, by independence, Equation (8.2) is estab- 
lished. To verify that a total of r successes must eventually be accumulated, either 
we can prove analytically that 


Srwam=> ("2 ta — wras (8.3) 


=r 


or we can give a probabilistic argument as follows: The number of trials required 

to obtain 7 successes can be expressed as Yj + Y2 + --- + Y;, where Y; equals 

the number of trials required for the first success, Y2 the number of additional trials 

after the first success until the second success occurs, Y3 the number of additional 

trials until the third success, and so on. Because the trials are independent and all 

have the same probability of success, it follows that Y;, Y2,..., Y- are all geometric 
r 


random variables. Hence, each is finite with probability 1,so }° Y; must also be finite, 
i=1 
establishing Equation (8.3). 

Any random variable X whose probability mass function is given by 
Equation (8.2) is said to be a negative binomial random variable with parameters 
(r, p). Note that a geometric random variable is just a negative binomial with param- 
eter (1, p). 

In the next example, we use the negative binomial to obtain another solution of 
the problem of the points. 


If independent trials, each resulting in a success with probability p, are performed, 
what is the probability of r successes occurring before s failures? 


Solution The solution will be arrived at by noting that r successes will occur before 
s failures if and only if the rth success occurs no later than the (r + s — 1) trial. This 
follows because if the rth success occurs before or at the (r + s — 1) trial, then it must 
have occurred before the sth failure, and conversely. Hence, from Equation (8.2), the 
desired probability is 


n=r 


r+s—1 4 1 
~ r n—-r 
© (tii )}ee-» 7 


The Banach match problem 


At all times, a pipe-smoking mathematician carries 2 matchboxes—1 in his left-hand 
pocket and 1 in his right-hand pocket. Each time he needs a match, he is equally 
likely to take it from either pocket. Consider the moment when the mathematician 
first discovers that one of his matchboxes is empty. If it is assumed that both match- 
boxes initially contained N matches, what is the probability that there are exactly k 
matches, k = 0,1,...,.N, in the other box? 


Solution Let E denote the event that the mathematician first discovers that the 
right-hand matchbox is empty and that there are k matches in the left-hand box 
at the time. Now, this event will occur if and only if the (NV + 1) choice of the right- 
hand matchbox is made at the (V + 1 + N — k) trial. Hence, from Equation (8.2) 
(with p = i,r=N + landn=2N — k + 1), we see that 


174 Chapter 4 Random Variables 


Example 
8f 


_ 1\2N-k+1 
r= (5 *)(5) 


Since there is an equal probability that it is the left-hand box that is first discovered 
to be empty and there are k matches in the right-hand box at that time, the desired 


result is 
IN —k 1\2N-k 
2P(E) = ( N (;) |_| 


Compute the expected value and the variance of a negative binomial random vari- 
able with parameters r and p. 


Solution We have 


etry = Yow ("2 Tora = py 


n=r 


(oe) 
= 25 (ota - pe since “(tt )=-(") 
pe r r—-—1 r 
n=r 


oo = by setting 
= Ym ‘ = ema — py") man +1 
P m=r+1 
="AYY - yr 
P 
where Y is a negative binomial random variable with parameters r + 1,p. Setting 
k = 1 in the preceding equation yields 


Setting k = 2 in the equation for ELX“] and using the formula for the expected value 
of a negative binomial random variable gives 


E[X?] = “BLY — 1] 


Therefore, 


Thus, from Example 8f, if independent trials, each of which is a success with 
probability p, are performed, then the expected value and variance of the number of 
trials that it takes to amass r successes is 7/p and r(1 — p)/p*, respectively. 

Since a geometric random variable is just a negative binomial with parameter 
r = 1, it follows from the preceding example that the variance of a geometric random 
variable with parameter p is equal to (1 — p)/p*, which checks with the result of 
Example 8c. 


Example 
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Find the expected value and the variance of the number of times one must throw a 
die until the outcome 1 has occurred 4 times. 


Solution Since the random variable of interest is a negative binomial with parame- 
1 


ters r = 4 and p = , it follows that 
E[X] = 24 
5 
(5) 
Var(X) = 5 = 120 | 
1 


Now, let us suppose that the independent trials are not ended when there have been 
a total of r successes, but that they continue on. Aside from X, the number of trials 
until there have been r successes, some other random variables of interest are, for 
s> 0, 


Y : the number of trials until there have been s failures; 

V: the number of trials until there have been either r successes or s failures; 

Z: the number of trials until there have been both at least r successes and at least 
s failures. 


Because each trial is independently a failure with probability 1 — p, it follows 
that Y is a negative binomial random variable with probability mass function 


=f 
P(Y =n) = (’ _ i) app". tas 


To determine the probability mass function of V = min(X, Y), note that the possible 
values of V are all less than + s. Suppose n < r + s. If either the r” success or the 
s failure occurs at time n then, because n < r + 5s, the other event would not yet 
have occurred. Consequently, V will equal n if either X or Y is equal to n. Because 
we cannot have both that X = n and that Y = n, this yields 


PV =n) = P(X =n) + P(Y =n) 


n—1 n— 1 ; 
= (’ 7 1 )e"a = py’ + (" 7 i) = p)p"~, ne<r + s 


To determine the probability mass function of Z = max(X, Y), note that Z=r+=s. 
Forn =r + s, if either the r” success or the s”” failure occurs at time n then the 
other event must have already occurred by time n. Consequently, forn = r+ s, Z 
will equal n if either X or Y is equal to n. This gives 


P(Z=n)=P(X =n) + P(Y =n) 
n—1 r n-r oS) Sn—s 
=( pra =) +( Ja - pre , ne2=rdt+s 
r—-1 s—1 


4.8.3 The Hypergeometric Random Variable 


Suppose that a sample of size n is to be chosen randomly (without replacement) 
from an urn containing N balls, of which m are white and N — mare black. If we let 
X denote the number of white balls selected, then 
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m N-m 
1 n-i 
P{X =i= i=0,1,...,n (8.4) 


A random variable X whose probability mass function is given by Equation (8.4) for 
some values of n, N, m is said to be a hypergeometric random variable. 


Remark Although we have written the hypergeometric probability mass function 
with i going from 0 to n, P{X = i} will actually be 0, unless i satisfies the inequalities 
n — (N — m) Si S min(v,m). However, Equation (8.4) is always valid because of 


our convention that / is equal to 0 when either k < Oorr < k. O 


An unknown number, say, N, of animals inhabit a certain region. To obtain some 
information about the size of the population, ecologists often perform the follow- 
ing experiment: They first catch a number, say, m, of these animals, mark them in 
some manner, and release them. After allowing the marked animals time to disperse 
throughout the region, a new catch of size, say, n, is made. Let X denote the number 
of marked animals in this second capture. If we assume that the population of ani- 
mals in the region remained fixed between the time of the two catches and that each 
time an animal was caught it was equally likely to be any of the remaining uncaught 
animals, it follows that X is a hypergeometric random variable such that 


m N-m 
1 n—-i 
P{X =i}= = P;(N) 


Suppose now that X is observed to equal i. Then, since P;(N) represents the 
probability of the observed event when there are actually N animals present in the 
region, it would appear that a reasonable estimate of N would be the value of N 
that maximizes P;(N). Such an estimate is called a maximum likelihood estimate. 
(See Theoretical Exercises 13 and 18 for other examples of this type of estimation 
procedure.) 

The maximization of P;(N) can be done most simply by first noting that 


P(N) — (N—m)(N — n) 
P(N —1) N(IN-—m—-—n+i 


Now, the preceding ratio is greater than 1 if and only if 
(N — m)\(N —n) = N(N-m-—1+4+ i) 


or, equivalently, if and only if 


Ns Te 
i 
Thus, P;(N) is first increasing and then decreasing and reaches its maximum value at 
the largest integral value not exceeding mn/i. This value is the maximum 
likelihood estimate of N. For example, suppose that the initial catch consisted of 
m = 50 animals, which are marked and then released. If a subsequent catch consists 
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of n = 40 animals of which i = 4 are marked, then we would estimate that there are 
some 500 animals in the region. (Note that the preceding estimate could also have 
been obtained by assuming that the proportion of marked animals in the region, 
m/N, is approximately equal to the proportion of marked animals in our second 
catch, i/n.) Oo 


A purchaser of electrical components buys them in lots of size 10. It is his policy 
to inspect 3 components randomly from a lot and to accept the lot only if all 3 are 
nondefective. If 30 percent of the lots have 4 defective components and 70 percent 
have only 1, what proportion of lots does the purchaser reject? 


Solution Let A denote the event that the purchaser accepts a lot. Now, 


3 
P(A) = P(Allot has 4 defectives) + P(Al|lot has 1 defective) 
4 6 1 9 
7 0 3 3 ‘ 0 3 7 
7 10 10 10 10 
3 3 


54 
Hence, 46 percent of the lots are rejected. fl 


~ 100 


If n balls are randomly chosen without replacement from a set of N balls of 
which the fraction p = m/N is white, then the number of white balls selected is 
hypergeometric. Now, it would seem that when m and WN are large in relation to 
n, it shouldn’t make much difference whether the selection is being done with or 
without replacement, because, no matter which balls have previously been selected, 
when m and N are large, each additional selection will be white with a probability 
approximately equal to p. In other words, it seems intuitive that when m and N are 
large in relation to n, the probability mass function of X should approximately be 
that of a binomial random variable with parameters n and p. To verify this intuition, 
note that if X is hypergeometric, then, fori = n, 


meg Ahead 


m! (N — m)! (N — n)!n! 
~(m—dDIiN—-m—n+i)la—vd! 
n\mm-—1 m—-it1iN-—-mN-m-1 
-( eS FS Nag oA 
N-m (n i 1) 
N= a rata 
i : a when p = m/N and mand N are 
; |Pa — P) 


ru 


large in relation to n andi 
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Determine the expected value and the variance of X, a hypergeometric random vari- 
able with parameters n, N, and m. 


Solution 


E[X*] =) EPLX = i} 
i=0 


Ee(y(3-2)/() 


Using the identities 


we obtain 
n 
ky __ nm kifm-1 N-m N-1 
m= (Pa \(nP)/ (re 
i= 


MM ELY + ye} 


where Y is a hypergeometric random variable with parameters n — 1, N — 1, and 
m — 1. Hence, upon setting k = 1, we have 


In words, if n balls are randomly selected from a set of N balls, of which m are white, 
then the expected number of white balls selected is nm/N. 
Upon setting k = 2 in the equation for ELX*], we obtain 


E[X?] =F ELY + 1] 
_am[(@ — 1)@n — 1) 
al N-1 +1] 


where the final equality uses our preceding result to compute the expected value of 
the hypergeometric random variable Y. 
Because ELX] = nm/N, we can conclude that 


Var(X) = 


nm | (n — 1)(m — 1) nm 
al vac © 7 | 


Letting p = m/N and using the identity 


m= 1. Np 1 1—p 
Vat ”> vot 
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shows that 
L=p 
Var(X) = np[(n 1)p (n Dy A +1 np| 
n—1 
= 1 1 |_| 
mp ( p)( vo 7) 


Remark We have shown in Example §j that if balls are randomly selected without 
replacement from a set of N balls, of which the fraction p are white, then the expected 
number of white balls chosen is np. In addition, if N is large in relation to n [so that 
(N — n)/(N — 1) is approximately equal to 1], then 


Var(X) © np(1 — p) 


In other words, ELX] is the same as when the selection of the balls is done with 
replacement (so that the number of white balls is binomial with parameters n 
and p), and if the total collection of balls is large, then Var(X) is approximately equal 
to what it would be if the selection were done with replacement. This is, of course, 
exactly what we would have guessed, given our earlier result that when the number 
of balls in the urn is large, the number of white balls chosen approximately has the 
mass function of a binomial random variable. Oo 


4.8.4 The Zeta (or Zipf) Distribution 


A random variable is said to have a zeta (sometimes called the Zipf) distribution if 
its probability mass function is given by 


MX == 7S Pot. 


for some value of a > 0. Since the sum of the foregoing probabilities must equal 1, 


it follows that 
oo 1 atl 
C= = 
> (z) 
k=1 


The zeta distribution owes its name to the fact that the function 


24 et) aly iy 
cersae(2) CY asin) oy 


is known in mathematical disciplines as the Riemann zeta function (after the 
German mathematician G. F. B. Riemann). 

The zeta distribution was used by the Italian economist V. Pareto to describe 
the distribution of family incomes in a given country. However, it was G. K. Zipf 
who applied zeta distribution to a wide variety of problems in different areas and, in 
doing so, popularized its use. 


4.9 Expected Value of Sums of Random Variables 


A very important property of expectations is that the expected value of a sum of 
random variables is equal to the sum of their expectations. In this section, we will 
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prove this result under the assumption that the set of possible values of the proba- 
bility experiment—that is, the sample space S—is either finite or countably infinite. 
Although the result is true without this assumption (and a proof is outlined in the 
theoretical exercises), not only will the assumption simplify the argument, but it will 
also result in an enlightening proof that will add to our intuition about expectations. 
So, for the remainder of this section, suppose that the sample space S is either a finite 
or a countably infinite set. 

For a random variable X, let X(s) denote the value of X when s ¢€ S is the 
outcome of the experiment. Now, if X and Y are both random variables, then so 
is their sum. That is, Z = X + Y is also a random variable. Moreover, Z(s) = 
X(s) + Y(s). 


Suppose that the experiment consists of flipping a coin 5 times, with the outcome 
being the resulting sequence of heads and tails. Suppose X is the number of heads 
in the first 3 flips and Y is the number of heads in the final 2 flips. Let Z = X + Y. 
Then, for instance, for the outcome s = (h,t,h,t,h), 


X(s) =2 
Y(s) =1 
Z(s) = X(s) + Y(s) =3 


meaning that the outcome (h, t,h, t,h) results in 2 heads in the first three flips, 1 head 
in the final two flips, and a total of 3 heads in the five flips. O 


Let p(s) = P({s}) be the probability that s is the outcome of the experiment. 
Because we can write any event A as the finite or countably infinite union of the 
mutually exclusive events {s},s € A, it follows by the axioms of probability that 


P(A) = >) p(s) 


sEA 


When A = 5S, the preceding equation gives 


1=) p(s) 


seS 


Now, let X be a random variable, and consider ELX]. Because X(s) is the value of X 
when s is the outcome of the experiment, it seems intuitive that E[X]—the weighted 
average of the possible values of X, with each value weighted by the probability that 
X assumes that value—should equal a weighted average of the values X(s),s € S, 
with X(s) weighted by the probability that s is the outcome of the experiment. We 
now prove this intuition. 


E[X] = J) X() po) 


ses 


Proof Suppose that the distinct values of X are x;,i = 1. For each i, let S; be the 
event that X is equal to x;. That is, $; = {s : X(s) = x;}. Then, 
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EX] = > wPUX =x) 
i 
=) > x,P(S) 
i 
eS aS) 


i seS; 


= ae) 


i seS; 


=)° >> X()pos) 


i seS; 


= >> X(s)p(s) 


seS 


where the final equality follows because 5S,,52,... are mutually exclusive events 
whose union is S. 


Suppose that two independent flips of a coin that comes up heads with probability p 
are made, and let X denote the number of heads obtained. Because 


P(X =0)=P¢H= — py’, 
P(X =1) = P(h,p) + P(t,h) = 2p — p) 
P(X = 2) = P(h,h) = p” 


it follows from the definition of expected value that 
E[X]=0- (1 — p)? +1- 2p — p) +2- p?=2p 
which agrees with 


E[X] =X(h,h)p? + X(h,Opd — p) + X(t, — pp + X(t,00 — py? 
=2p? + p(l — p) + 1 — p)p 


We now prove the important and useful result that the expected value of a sum of 
random variables is equal to the sum of their expectations. 


For random variables _X1, X2,...,Xn, 


Proof Let Z = )~_, Xj. Then, by Proposition 9.1, 


E[Z] = > Z(s)p(s) 


seS 


= \> (Xi) + X2(8) +... + Xn(9)) ps) 


ses 
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= Do X1(9)ps) + Yo Xa()p(s) +... + YO Xnl) p(s) 


ses seS ses 


= ELX,] + E[X2] tot E[Xn] 


Find the expected value of the sum obtained when n fair dice are rolled. 


Solution Let X be the sum. We will compute ELX] by using the representation 


n 
X= > Xx: 
i=1 


where Xj is the upturned value on die 7. Because Xj is equally likely to be any of the 
values from 1 to 6, it follows that 


6 
E[Xi] = >. i(1/6) = 21/6 =7/2 


i=l 


which yields the result 


E[X]=E WX =" EX] = 3.50 : 
i=1 i=1 


Find the expected total number of successes that result from 7 trials when trial zis a 
success with probability p;, i= 1,...,n. 


Solution Letting 


Y= 1, if trial iis a success 
‘“ ) 0, iftrial iis a failure 


we have the representation n 

X=) X; 
i=1 

Consequently, 


E[X] =) E[Xi] = >i 
i=1 i=1 


Note that this result does not require that the trials be independent. It includes as a 
special case the expected value of a binomial random variable, which assumes inde- 
pendent trials and all p; = p, and thus has mean np. It also gives the expected value of 
a hypergeometric random variable representing the number of white balls selected 
when n balls are randomly selected, without replacement, from an urn of N balls of 
which m are white. We can interpret the hypergeometric as representing the number 
of successes in 7 trials, where trial i is said to be a success if the ith ball selected is 
white. Because the ith ball selected is equally likely to be any of the N balls and thus 
has probability m/N of being white, it follows that the hypergeometric is the num- 
ber of successes in n trials in which each trial is a success with probability p = m/N. 
Hence, even though these hypergeometric trials are dependent, it follows from the 
result of Example 9d that the expected value of the hypergeometric is np = nm/N. @ 


Example 
9e 
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Derive an expression for the variance of the number of successful trials in Exam- 
ple 9d, and apply it to obtain the variance of a binomial random variable with param- 
eters n and p, and of a hypergeometric random variable equal to the number of white 
balls chosen when n balls are randomly chosen from an urn containing N balls of 
which m are white. 


Solution Letting X be the number of successful trials, and using the same represen- 
tation for X —namely, X = )~/_, X;—as in the previous example, we have 


n n 
B= E) > 2G) | 
j=l 


i=1 


=E mx Xi + DX; 


jal 
=E DP + PKK 
oo 
= Da + VV AK 
i=1 fA 
-Dn+ DD AMA 


i=1 jAi 


where the final equation used that a = X;. However, because the possible values 
of both X; and Xj are 0 or 1, it follows that 


1, if X;=1,X;=1 


AG= 0, otherwise 


Hence, 


E[X;Xj] = P{Xi = 1, Xj = 1} = P(trials i and j are successes) 


Thus, with p;; = P(X; = 1, Xj = 1), the preceding and the result of Example 9d yield 


that 
Var(X) = De + at - Dp" (9.1) 


i=l fA 
If X is binomial with parameters n, p, then p; = p and, by the independence of 
trials, pi; = p’, i # j. Consequently, Equation (9.1) yields that 
Var(X) = np + n(n — Vp? — n’p* = np — p) 


On the other hand, if X is hypergeometric, then as each of the N balls is equally 
likely to be the /” ball chosen, it follows that pj; = m/N. Also, for i # j 


mm—1 


pig = P(X) = 1X) = 1) = PO = DP) = UX =D = FG 
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which follows because given that the i” ball selected is white, each of the other N — 1 
balls, of which m — 1 are white, is equally likely to be the j” ball selected. Conse- 
quently, (9.1) yields that 


2 
Var(X) = — ep me | eae (7) 


NN-1 N 


which, as shown in Example 8}, can be simplified to yield 


n—1 
Var(X) = np(1 — p) (1 No -) 


where p = m/N. | 


4.10 Properties of the Cumulative Distribution Function 


Recall that for the distribution function F of X, F(b) denotes the probability that the 
random variable X takes on a value that is less than or equal to b. The following are 
some properties of the cumulative distribution function (c.d.f.) F: 


1. Fis a nondecreasing function; that is, ifa < b, then F(a) = F(b). 
2. lim F(b) =1. 
b—oo 
3. lim F(b) =0. 
b—>-—oo 
4 


. Fis right continuous. That is, for any b and any decreasing sequence by,n = 1, 
that converges to b, lim F(by) = F(b). 
n co 


Property 1 follows, as was noted in Section 4.1, because, for a < b, the event 
{X =< a} is contained in the event {X = b} and so cannot have a larger probabil- 
ity. Properties 2, 3, and 4 all follow from the continuity property of probabilities 
(Section 2.6). For instance, to prove property 2, we note that if b, increases to oo, 
then the events {X = b,},n = 1, are increasing events whose union is the event 
{X < oo}. Hence, by the continuity property of probabilities, 


lim P{X < by} = P{IX < of =1 
n—> oo 


which proves property 2. 

The proof of property 3 is similar and is left as an exercise. To prove property 4, 
we note that if b, decreases to b, then {X¥ =< by},n = 1, are decreasing events whose 
intersection is {X¥ = b}. The continuity property then yields 


lim P{X S by} = P{X S b} 
n—> oo 
which verifies property 4. 


All probability questions about X can be answered in terms of the c.d.f., F. For 


example, 
Pla < X = b}=F(b) — F(a) foralla < b (10.1) 


This equation can best be seen to hold if we write the event {X¥ =< b} as the union of 
the mutually exclusive events {X =< a} and {a < X < b}. Thatis, 


{X = b}={X <a} U {a < X = }} 
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so 
P(X < b} = P(X <a} + Pla < X <}} 


which establishes Equation (10.1). 
If we want to compute the probability that X is strictly less than b, we can again 
apply the continuity property to obtain 


Note that P(X < b} does not necessarily equal F(b), since F(b) also includes the 
probability that X equals b. 


Example The distribution function of the random variable X is given by 


10a 
K=O 


0O=<=x <1 


F(x) = L=ax<2 


a bl WIN NI & © 
— 


A graph of F(x) is presented in Figure 4.10. Compute (a) P(X < 3}, (b) P{X = 1}, 
(c) P(X > 4}, and (d) P{2 < X < 4}. 


Solution (a) P{X < 3) =timP {x = 3 _ ~ | = tim # (3 _ “)= - 
n n n n 12 
(b) PIX =1}=P{X = 1} -— P{X < 1} 
. 1 2 1 1 
=F) — tim (1 ~)=5 a 
(c) ee ea. 
P|x » ;| 1 PIX = 5 
=1-F(5)=3 
2 4 
(d) Pi2< X <= 44=F(4) —- FC) 
1 


~ 122 
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F(x) 


NIB wit 


Summary 


A real-valued function defined on the outcome of a prob- 
ability experiment is called a random variable. 
If X is a random variable, then the function F(x) 
defined by 
Fx) = P{X S x} 


is called the distribution function of X. All probabilities 
concerning X can be stated in terms of F. 

A random variable whose set of possible values is 
either finite or countably infinite is called discrete. If X is a 
discrete random variable, then the function 


p(x) = P(X =} 


is called the probability mass function of X. Also, the quan- 
tity ELX] defined by 


> xp(x) 


x:ip(x)>0 


is called the expected value of X. E[X] is also commonly 
called the mean or the expectation of X. 
A useful identity states that for a function g, 


E[g(X]= D> gop) 


x:p(x)>0 


The variance of a random variable X, denoted by Var(X), 
is defined by 


Var(X) = E[(X — E[X])?] 


The variance, which is equal to the expected square of 
the difference between X and its expected value, is a mea- 
sure of the spread of the possible values of X. A useful 
identity is 

Var(X) = E[X*] — (E[X]) 


—_—______- 
L —_—_____ 
L —_____- 
! ! ! x 
1 2 3 


Figure 4.10 Graph of F(x). 


The quantity /Var(X) is called the standard deviation 
of X. 

We now note some common types of discrete random 
variables. The random variable X whose probability mass 
function is given by 


po = (77) ria = py f=Ocst 


is said to be a binomial random variable with parameters n 
and p. Such a random variable can be interpreted as being 
the number of successes that occur when n independent 
trials, each of which results in a success with probability p, 
are performed. Its mean and variance are given by 


E[X|]=np Var(X) = np(1 — p) 
The random variable X whose probability mass function is 
given by 
etn! 


: i=0 
1! 


pPO= 


is said to be a Poisson random variable with parameter A. 
If a large number of (approximately) independent trials 
are performed, each having a small probability of being 
successful, then the number of successful trials that result 
will have a distribution that is approximately that of a Pois- 
son random variable. The mean and variance of a Poisson 
random variable are both equal to its parameter i. That is, 


E[X] = Var(X) = 2 


The random variable X whose probability mass function is 
given by 


pi) =pd — pr! i=1,2,... 


is said to be a geometric random variable with parameter 
p. Such a random variable represents the trial number of 
the first success when each trial is independently a success 
with probability p. Its mean and variance are given by 


ames) 
p 


E[X]= ; Var(X) = : 


The random variable X whose probability mass function is 
given by 


pO = és | era — py" i2r 


is said to be a negative binomial random variable with 
parameters r and p. Such a random variable represents the 
trial number of the rth success when each trial is indepen- 
dently a success with probability p. Its mean and variance 
are given by 


E[X]=~ Var(X) = aa 
P P 


Problems 


4.1. Sophie is choosing two coins randomly from a box 
containing four $2, five $1, eight 50¢, and three 20¢ coins. 
Let X denote Sophie’s income. What are the possible val- 
ues of X, and what are the probabilities associated with 
each value? 


4.2. Two fair dice are rolled. Let X equal the ratio of the 
value on the first die to that on the second die. Find the 
probabilities attached to the possible values that X can 
take on. 


4.3. Three fair dice are rolled. Assume that all 6° = 216 
possible outcomes are equally likely. Let X equal the prod- 
uct of the 3 dice. Find the probabilities attached to the 
possible values that X can take on. 


4.4. Six men and 4 women are ranked according to the 
time they took to complete a 5-mile trail run. Assume that 
no two individuals took the same time and that all 10! pos- 
sible rankings are equally likely. What is the probability 
that at least one out of the three highest ranking individu- 
als is a woman? 


4.5. Let X represent the difference between the number 
of heads and the number of tails obtained when a coin is 
tossed n times. What are the possible values of X? 


4.6. In Problem 4.5, for n = 3, if the coin is assumed fair, 
what are the probabilities associated with the values that 
X can take on? 
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A hypergeometric random variable X with parameters n, 
N, and m represents the number of white balls selected 
when n balls are randomly chosen from an urn that con- 
tains N balls of which m are white. The probability mass 
function of this random variable is given by 


Clea ae 
es 


With p = m/N, its mean and variance are 


PO = 


N-1n 


Nu {mul — p) 


E[X]=np_ Var(X) = 


An important property of the expected value is that the 
expected value of a sum of random variables is equal to 
the sum of their expected values. That is, 


4.7. Suppose that a die is rolled twice. What are the 
possible values that the following random variables can 
take on: 

(a) the maximum value to appear in the two rolls; 

(b) the minimum value to appear in the two rolls; 

(c) the sum of the two rolls; 


(d) the value of the first roll minus the value of the second 
roll? 


4.8. If the die in Problem 4.7 is assumed fair, calculate the 
probabilities associated with the random variables in parts 
(a) through (d). 


4.9. Repeat Example lc when the balls are selected with 
replacement. 


4.10. Let X be the winnings of a gambler. Let p(i) = 
P(X = i) and suppose that 


p(O) = 1/3; p() = p(-1) = 13/55; 
p(2) = p(—2) = 1/11; p@) = p(-3) = 1/165 


Compute the conditional probability that the gambler wins 
i, i= 1,2,3, given that he wins a positive amount. 


4.11. The random variable X is said to follow the distribu- 
tion of Benford’s Law if 


i+ 1 
i 


j, $51,2.3,...,9 
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It has been shown to be a good fit for the distribution of 
the first digit of many real life data values. 


(a) Verify that the preceding is a probability mass function 
by showing that a PX =i)=1. 
(b) Find P(X S j). 


4.12. In the game of Two-Finger Morra, 2 players show 1 
or 2 fingers and simultaneously guess the number of fin- 
gers their opponent will show. If only one of the players 
guesses correctly, he wins an amount (in dollars) equal to 
the sum of the fingers shown by him and his opponent. If 
both players guess correctly or if neither guesses correctly, 
then no money is exchanged. Consider a specified player, 
and denote by X the amount of money he wins in a single 
game of Two-Finger Morra. 


(a) If each player acts independently of the other, and if 
each player makes his choice of the number of fingers he 
will hold up and the number he will guess that his oppo- 
nent will hold up in such a way that each of the 4 possibili- 
ties is equally likely, what are the possible values of X and 
what are their associated probabilities? 


(b) Suppose that each player acts independently of the 
other. If each player decides to hold up the same number 
of fingers that he guesses his opponent will hold up, and 
if each player is equally likely to hold up 1 or 2 fingers, 
what are the possible values of X and their associated 
probabilities? 


4.13. A man wants to buy tablets for his two daughters as 
Christmas gifts. He goes to an electronics shop that has the 
two latest models. The probability that Sabrina, the older 
daughter, will accept the gift is .9, whereas the probability 
that Samantha, the younger daughter, will accept the gift 
is .7 These two probabilities are independent. There is a .8 
probability that Sabrina will choose the first model, which 
costs $600, and a .2 probability that she chooses the second 
model, which costs $450. Samantha is equally likely to opt 
for either model. Determine the expected total cost that 
will be incurred by the man. 


4.14. Five distinct numbers are randomly distributed to 
players numbered 1 through 5. Whenever two players 
compare their numbers, the one with the higher one is 
declared the winner. Initially, players 1 and 2 compare 
their numbers; the winner then compares her number with 
that of player 3, and so on. Let X denote the number of 
times player | is a winner. Find P{X = /},i = 0,1,2,3,4. 


4.15. A state wants to select 10 players along with a goal- 
keeper from 9 football teams who will represent their state 
in the national league. An urn consisting of 45 balls is 
used for the selection of the players. Each of the balls is 
inscribed with the name of a team: 9 balls have the name 
of the best-performing team, 8 balls have the name of the 
second best-performing team, and so on (with 1 ball for 


the worst-performing team). A ball is chosen at random, 
and the team whose name is on the ball is instructed to 
pick a player who will join the state’s team. Another ball is 
then chosen at random and the team named on the ball 
is asked to pick a player. A third ball is randomly cho- 
sen and the team named on the ball (provided that not 
all 3 chosen balls are of the same team) is asked to choose 
the third player. If 3 balls are chosen from the same team, 
the third ball is replaced and another one is chosen. This 
continues until a ball from another team is chosen. The 7 
remaining players are then picked in a way from the teams 
that were not picked from the urn such that all 9 teams 
are represented at least once. If all 3 chosen balls are of 
a different team, then 2 out of the 7 remaining players 
are selected out of the best-performing team which was 
not chosen from the urn. What is the probability that the 
third best-performing team in the competition will have 
two representative players? 


4.16. A deck of n cards numbered 1 through n are to be 
turned over one a time. Before each card is shown you are 
to guess which card it will be. After making your guess, 
you are told whether or not your guess is correct but not 
which card was turned over. It turns out that the strategy 
that maximizes the expected number of correct guesses 
fixes a permutation of the n cards, say 1,2,...,n, and then 
continually guesses 1 until it is correct, then continually 
guesses 2 until either it is correct or all cards have been 
turned over, and then continually guesses 3, and so on. Let 
G denote the number of correct guesses yielded by this 
strategy. Determine P(G = k). 


Hint: In order for G to be at least kK what must be the order 
of cards 1,...,k. 


4.17. Suppose that the distribution function of the random 
variable X is given by 


0 x < 0 
mos 5 (2ce4 
Pe 
ered 1sx <3 
4 
1 x23 
(a) Find P(X < 1}. 
(b) Find P(X > 2}. 
(c) Find P{ 4 2X < 3. 


4.18. During a tournament, a football team plays a match 
against 3 different teams. The probabilities that this team 
wins against the first, second, and third teams are .8, .65, 
and .3, respectively, and are independent. Let X denote 
the number of wins obtained. Calculate the probability 
mass function of X. 


4.19. If the distribution function of the random variable X 
is given by 


0 x <l 

1 

4 1sx<3 

5 

3 3sx<4 
F(x) = 3 

Z 4=x <6 

7 

8 6=x <7 

1 x27 


calculate the probability mass function of X. 


4.20. A gambling book recommends the following “win- 
ning strategy” for the game of roulette: Bet $1 on red. If 
red appears (which has probability 38), then take the $1 
profit and quit. If red does not appear and you lose this bet 
(which has probability 32 of occurring), make additional 
$1 bets on red on each of the next two spins of the roulette 
wheel and then quit. Let X denote your winnings when 
you quit. 

(a) Find P{X > 0}. 

(b) Are you convinced that the strategy is indeed a “win- 
ning” strategy? Explain your answer! 

(c) Find E[X]. 


4.21. Four buses carrying 148 students from the same 
school arrive at a football stadium. The buses carry, respec- 
tively, 40, 33, 25, and 50 students. One of the students is 
randomly selected. Let X denote the number of students 
who were on the bus carrying the randomly selected stu- 
dent. One of the 4 bus drivers is also randomly selected. 
Let Y denote the number of students on her bus. 


(a) Which of ELX] or E[Y] do you think is larger? Why? 
(b) Compute EX] and E[Y]. 


4.22. Suppose that two teams play a series of games that 
ends when one of them has won i games. Suppose that 
each game played is, independently, won by team A with 
probability p. Find the expected number of games that are 
played when (a) i = 2 and (b) i = 3. Also, show in both 
cases that this number is maximized when p = i. 


4.23. You have $1000, and a certain commodity presently 
sells for $2 per ounce. Suppose that after one week the 
commodity will sell for either $1 or $4 an ounce, with these 
two possibilities being equally likely. 

(a) If your objective is to maximize the expected amount 
of money that you possess at the end of the week, what 
strategy should you employ? 
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(b) If your objective is to maximize the expected amount 
of the commodity that you possess at the end of the week, 
what strategy should you employ? 


4.24. A and B play the following game: A writes down 
either number 1 or number 2, and B must guess which 
one. If the number that A has written down is 7 and B has 
guessed correctly, B receives i units from A. If B makes a 
wrong guess, B pays ; unit to A. If B randomizes his deci- 
sion by guessing 1 with probability p and 2 with probability 
1 — p, determine his expected gain if (a) A has written 
down number 1 and (b) A has written down number 2. 

What value of p maximizes the minimum possible value 
of B’s expected gain, and what is this maximin value? 
(Note that B’s expected gain depends not only on p, but 
also on what A does.) 

Consider now player A. Suppose that she also random- 
izes her decision, writing down number 1 with probability 
q. What is A’s expected loss if (c) B chooses number 1 and 
(d) B chooses number 2? 

What value of g minimizes A’s maximum expected loss? 
Show that the minimum of A’s maximum expected loss 
is equal to the maximum of B’s minimum expected gain. 
This result, known as the minimax theorem, was first 
established in generality by the mathematician John von 
Neumann and is the fundamental result in the mathemati- 
cal discipline known as the theory of games. The common 
value is called the value of the game to player B. 


4.25. Four coins are flipped. The first two coins are fair, 
whereas the third and fourth coins are biased. The latter 
coins land on heads with probabilities .7 and .4, respec- 
tively. Assume that the outcomes of the flips are indepen- 
dent. Find the probability that 

(a) exactly one head appears; 

(b) two heads appear. 

4.26. One of the numbers 1 through 10 is randomly cho- 
sen. You are to try to guess the number chosen by asking 
questions with “yes—no” answers. Compute the expected 
number of questions you will need to ask in each of the 
following two cases: 

(a) Your ith question is 
1, 2, 3, 4, 5, 6, 7 8, 9, 10. 
(b) With each question you try to eliminate one-half of the 
remaining numbers, as nearly as possible. 


to be “Is it 22” iG = 


4.27. An insurance company writes a policy to the effect 
that an amount of money A must be paid if some event 
E occurs within a year. If the company estimates that E 
will occur within a year with probability p, what should it 
charge the customer in order that its expected profit will 
be 10 percent of A? 


4.28. A teacher selects a group of 5 students at random 
from her class consisting of 11 female students and 10 male 
students. Find the expected number of female students in 
the group. 
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4.29. There are two possible causes for a breakdown of a 
machine. To check the first possibility would cost C; dol- 
lars, and, if that were the cause of the breakdown, the 
trouble could be repaired at a cost of R; dollars. Similarly, 
there are costs Cz and R>2 associated with the second pos- 
sibility. Let p and 1 — p denote, respectively, the probabil- 
ities that the breakdown is caused by the first and second 
possibilities. Under what conditions on p, C;, Ri,i = 1,2, 
should we check the first possible cause of breakdown 
and then the second, as opposed to reversing the check- 
ing order, so as to minimize the expected cost involved in 
returning the machine to working order? 

Note: If the first check is negative, we must still check the 
other possibility. 


4.30. A person tosses a fair coin until a tail appears for the 
first time. If the tail appears on the nth flip, the person wins 
2” dollars. Let X denote the player’s winnings. Show that 
E[X] = +o0. This problem is known as the St. Petersburg 
paradox. 

(a) Would you be willing to pay $1 million to play this 
game once? 

(b) Would you be willing to pay $1 million for each game 
if you could play for as long as you liked and only had to 
settle up when you stopped playing? 


4.31. Each night different meteorologists give us the prob- 
ability that it will rain the next day. To judge how well these 
people predict, we will score each of them as follows: If a 
meteorologist says that it will rain with probability p, then 
he or she will receive a score of 


if it does rain 
if it does not rain 


1-— (1 — p? 
je 


We will then keep track of scores over a certain time span 
and conclude that the meteorologist with the highest aver- 
age score is the best predictor of weather. Suppose now 
that a given meteorologist is aware of our scoring mecha- 
nism and wants to maximize his or her expected score. If 
this person truly believes that it will rain tomorrow with 
probability p*, what value of p should he or she assert so 
as to maximize the expected score? 


4.32. To determine whether they have a certain disease, 
100 people are to have their blood tested. However, rather 
than testing each individual separately, it has been decided 
first to place the people into groups of 10. The blood sam- 
ples of the 10 people in each group will be pooled and 
analyzed together. If the test is negative, one test will suf- 
fice for the 10 people, whereas if the test is positive, each 
of the 10 people will also be individually tested and, in all, 
11 tests will be made on this group. Assume that the prob- 
ability that a person has the disease is .1 for all people, 
independently of one another, and compute the expected 
number of tests necessary for each group. (Note that we 


are assuming that the pooled test will be positive if at least 
one person in the pool has the disease.) 


4.33. A newsboy purchases papers at 10 cents and sells 
them at 15 cents. However, he is not allowed to return 
unsold papers. If his daily demand is a binomial random 
variable with n = 10,p = i, approximately how many 
papers should he purchase so as to maximize his expected 
profit? 


4.34. In Example 4b, suppose that the department store 
incurs an additional cost of c for each unit of unmet 
demand. (This type of cost is often referred to as a 
goodwill cost because the store loses the goodwill of 
those customers whose demands it cannot meet.) Com- 
pute the expected profit when the store stocks s units, 
and determine the value of s that maximizes the expected 
profit. 


4.35. Two cards are drawn at random from an ordinary 
deck of 52 playing cards. If the two cards display the same 
suit, you win $2. If they are of the same color only, you win 
$1. Otherwise, you lose 50c. Calculate 


(a) the expected value of the amount you win; 
(b) the variance of the amount you win. 


4.36. Consider the friendship network described by 
Figure 4.5. Let X be a randomly chosen person and let 
Z be a randomly chosen friend of X. With f(z) equal to 
the number of friends of person i, show that E[f(Z)| = 
Elf(X)]. 


4.37. Consider Problem 4.22 with i = 2. Find the variance 
of the number of games played, and show that this number 
is maximized when p = 5 


4.38. Find Var(X) and Var(Y) for X and Y as given in 
Problem 4.21. 


4.39. If ELX] = 3 and Var(X) = 1, find 
(a) E[(4X — 1)7]; 
(b) Var(S5 — 2X). 


4.40. A card is drawn at random from an ordinary deck of 
52 playing cards. After the card is drawn, it is replaced. The 
deck is reshuffled and another card is drawn at random. 
This process goes on indefinitely. What is the probability 
that exactly 3 out of the first 5 cards that have been drawn 
are red? 


4.41. On a multiple-choice test with 3 possible answers for 
each of the 8 questions, what is the probability that a stu- 
dent who has not studied for the test will guess at least 6 
correct answers? 


4.42. A fair die is tossed 10 times consecutively. What is 
the probability that at most 6 out of the 10 tosses result in 
an even number? 


4.43. A and B will take the same 10-question examination. 
Each question will be answered correctly by A with prob- 
ability .7, independently of her results on other questions. 
Each question will be answered correctly by B with prob- 
ability .4, independently both of her results on the other 
questions and on the performance of A. 


(a) Find the expected number of questions that are 
answered correctly by both A and B. 


(b) Find the variance of the number of questions that are 
answered correctly by either A or B. 


4.44. A communications channel transmits the digits 0 and 
1. However, due to static, the digit transmitted is incor- 
rectly received with probability .2. Suppose that we want 
to transmit an important message consisting of one binary 
digit. To reduce the chance of error, we transmit 00000 
instead of 0 and 11111 instead of 1. If the receiver of the 
message uses “majority” decoding, what is the probabil- 
ity that the message will be wrong when decoded? What 
independence assumptions are you making? 


4.45. A satellite system consists of n components and 
functions on any given day if at least k of the n com- 
ponents function on that day. On a rainy day, each of 
the components independently functions with probability 
Pi, whereas on a dry day, each independently functions 
with probability p2. If the probability of rain tomorrow 
is a, what is the probability that the satellite system will 
function? 


4.46. A student is getting ready to take an important oral 
examination and is concerned about the possibility of havy- 
ing an “on” day or an “off” day. He figures that if he has 
an on day, then each of his examiners will pass him, inde- 
pendently of one another, with probability .8, whereas if 
he has an off day, this probability will be reduced to .4. 
Suppose that the student will pass the examination if a 
majority of the examiners pass him. If the student believes 
that he is twice as likely to have an off day as he is to have 
an on day, should he request an examination with 3 exam- 
iners or with 5 examiners? 


4.47. Suppose that it takes at least 9 votes from a 12- 
member jury to convict a defendant. Suppose also that the 
probability that a juror votes a guilty person innocent is 
.2, whereas the probability that the juror votes an innocent 
person guilty is .1. If each juror acts independently and if 
65 percent of the defendants are guilty, find the probability 
that the jury renders a correct decision. What percentage 
of defendants is convicted? 


4.48. In some military courts, 9 judges are appointed. 
However, both the prosecution and the defense attorneys 
are entitled to a peremptory challenge of any judge, in 
which case that judge is removed from the case and is 
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not replaced. A defendant is declared guilty if the major- 
ity of judges cast votes of guilty, and he or she is declared 
innocent otherwise. Suppose that when the defendant is, 
in fact, guilty, each judge will (independently) vote guilty 
with probability .7 whereas when the defendant is, in fact, 
innocent, this probability drops to .3. 


(a) What is the probability that a guilty defendant is 
declared guilty when there are (i) 9, (ii) 8, and (iii) 7 
judges? 

(b) Repeat part (a) for an innocent defendant. 

(c) If the prosecuting attorney does not exercise the right 
to a peremptory challenge of a judge, and if the defense 
is limited to at most two such challenges, how many chal- 
lenges should the defense attorney make if he or she is 60 
percent certain that the client is guilty? 


4.49. A company sells LED bulbs in packages of 20 for 
$25. From past records, it knows that a bulb will be defec- 
tive with probability .01. The company agrees to pay a full 
refund if a customer finds more than 1 defective bulb in a 
pack. If the company sells 100 packs, how much should it 
expect to refund? 


4.50. When coin | is flipped, it lands on heads with prob- 
ability .4; when coin 2 is flipped, it lands on heads with 
probability .7. One of these coins is randomly chosen and 
flipped 10 times. 


(a) What is the probability that the coin lands on heads on 
exactly 7 of the 10 flips? 

(b) Given that the first of these 10 flips lands heads, what 
is the conditional probability that exactly 7 of the 10 flips 
land on heads? 


4.51. Each member of a population of size n is, indepen- 
dently, female with probability p or male with probability 
1 — p.Let X be the number of the other — 1 members 
of the population that are the same sex as is person 1. (So 
X =n — (ifalln people are of the same sex.) 


(a) Find PLY = 7), i=0,...,n — 1. 


Now suppose that two people of the same sex will, inde- 
pendently of other pairs, be friends with probability a; 
whereas two persons of opposite sexes will be friends with 
probability 6. Find the probability mass function of the 
number of friends of person 1. 


4.52. In a tournament involving players 1,2,3,4, players 1 
and 2 play a game, with the loser departing and the win- 
ner then playing against player 3, with the loser of that 
game departing and the winner then playing player 4. The 
winner of the game involving player 4 is the tournament 
winner. Suppose that a game between players i and j is 
won by player i with probability ae 
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(a) Find the expected number of games played by player 1. 
(b) Find the expected number of games played by player 3. 


4.53. Suppose that Harry plays 10 rounds of tennis against 
Smith and wins with probability p during each round. 
Given that Harry has won a total of 7 rounds, find the con- 
ditional probability that his outcomes in the first 3 rounds 
are 


(a) win, win, lose; 
(b) lose, win, lose. 


4.54. The expected number of dancers falling on stage dur- 
ing a contest is .3. What is the probability that during the 
next contest, (a) no dancer falls on stage and (b) 3 or more 
dancers fall on stage? Explain your reasoning. 


4.55. The monthly worldwide average number of airplane 
crashes of commercial airlines is 3.5. What is the probabil- 
ity that there will be 


(a) at least 2 such accidents in the next month; 


(b) at most 1 accident in the next month? 
Explain your reasoning! 


4.56. Approximately 80,000 marriages took place in the 
state of New York last year. Estimate the probability that 
for at least one of these couples, 


(a) both partners were born on April 30; 
(b) both partners celebrated their birthday on the same 


day of the year. 
State your assumptions. 


4.57. Suppose that the average number of follower 
requests that an advertising page receives weekly is 50. 
Approximate the probability that the page will receive 


(a) exactly 35 follower requests in the next week; 
(b) at least 40 follower requests in the next week. 


4.58. An examination board appoints two vetters. The 
average number of errors per exam paper found by the 
first vetter is 4, and the average number of errors per exam 
paper found by the second vetter is 5. If an examiner’s 
paper is equally likely to be vetted by either vetter, approx- 
imate the probability that it will have no errors. 


4.59. How many people are needed so that the probability 
that at least one of them has the same first and last name 
initials as you is at least 39 


4.60. Suppose that the number of weekly traffic accidents 
occurring in a small town is a Poisson random variable with 
parameter A = 7. 

(a) What is the probability that at least 4 accidents occur 
(until) this week? 

(b) What is the probability that at most 5 accidents occur 
(until) this week given that at least 1 accident will occur 
today? 


4.61. Compare the Poisson approximation with the correct 
binomial probability for the following cases: 


(a) P{X = 2} whenn = 8, p= .1; 
(b) P{X = 9} when n = 10, p = .95; 
(c) P{X = 0} when n = 10, p = .1; 
(d) P{X = 4} whenn = 9, p = .2. 


4.62. If you buy a lottery ticket in 50 lotteries, in each of 
which your chance of winning a prize is -1,, what is the 


100 > 
(approximate) probability that you will win a prize 


(a) at least once? 
(b) exactly once? 
(c) at least twice? 


4.63. The number of times that a person contracts a cold 
in a given year is a Poisson random variable with param- 
eter 4 = 5. Suppose that a new wonder drug (based on 
large quantities of vitamin C) has just been marketed that 
reduces the Poisson parameter to A = 3 for 75 percent 
of the population. For the other 25 percent of the popu- 
lation, the drug has no appreciable effect on colds. If an 
individual tries the drug for a year and has 2 colds in that 
time, how likely is it that the drug is beneficial for him or 
her? 


4.64. While driving along a long route that has 5,000 inter- 
sections, the probability of encountering a red light at any 
intersection is .001. Find an approximation for the proba- 
bility that a driver will encounter at least 2 red lights. 


4.65. Consider n independent trials, each of which results 
in one of the outcomes 1,...,k with respective probabil- 
ities p1,...,pk, 0, pi = 1. Show that if all the p; are 
small, then the probability that no trial outcome occurs 
more than once is approximately equal to exp(—n(n — 1) 


YP? /2). 


4.66. Customers enter a supermarket located on a busy 
street at a rate of 2 every 3 minutes. 


(a) What is the probability that no customer enters the 
supermarket between 07:00 and 07:06? 


(b) What is the probability that at least 5 customers enter 
during this time? 


4.67. In a certain country, babies are born at an approx- 
imate rate of 6.94 births per 1,000 inhabitants per year. 
Assume that the total population is 40,000. 

(a) What is the probability that there will be more than 60 
births in this country during a 3-month period? 

(b) What is the probability that there will be more than 60 
births in at least 2 phases of 3 months during the next year? 
(c) If the present season (a 3-month period) is identified 


as Season 1, what is the probability that the first season to 
have more than 60 births will be Season i (i = 1,2,3, 4)? 


4.68. Each of 500 soldiers in an army company indepen- 
dently has a certain disease with probability 1/10°. This 
disease will show up in a blood test, and to facilitate mat- 
ters, blood samples from all 500 soldiers are pooled and 
tested. 


(a) What is the (approximate) probability that the blood 
test will be positive (that is, at least one person has the 
disease)? 


Suppose now that the blood test yields a positive result. 


(b) What is the probability, under this circumstance, that 
more than one person has the disease? 


Now, suppose one of the 500 people is Jones, who knows 
that he has the disease. 


(c) What does Jones think is the probability that more than 
one person has the disease? 


Because the pooled test was positive, the authorities have 
decided to test each individual separately. The first i — 1 
of these tests were negative, and the ith one—which was 
on Jones—was positive. 

(d) Given the preceding scenario, what is the probability, 
as a function of i, that any of the remaining people have 
the disease? 


4.69. A total of 2n people, consisting of n married cou- 
ples, are randomly seated (all possible orderings being 
equally likely) at a round table. Let C; denote the event 
that the members of couple i are seated next to each other, 
i=1,...,n. 

(a) Find P(C)). 

(b) For] # i, find P(G|C). 

(c) Approximate the probability, for n large, that there are 
no married couples who are seated next to each other. 


4.70. Repeat the preceding problem when the seating is 
random but subject to the constraint that the men and 
women alternate. 


4.71. In response to an attack of 10 missiles, 500 antiballis- 
tic missiles are launched. The missile targets of the antibal- 
listic missiles are independent, and each antiballstic missile 
is equally likely to go towards any of the target missiles. If 
each antiballistic missile independently hits its target with 
probability .1, use the Poisson paradigm to approximate 
the probability that all missiles are hit. 


4.72. A fair coin is flipped 10 times. Find the probability 
that there is a string of 4 consecutive heads by 

(a) using the formula derived in the text; 

(b) using the recursive equations derived in the text. 


(c) Compare your answer with that given by the Poisson 
approximation. 


4.73. At time 0, a coin that comes up heads with proba- 
bility p is flipped and falls to the ground. Suppose it lands 
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on heads. At times chosen according to a Poisson process 
with rate 4, the coin is picked up and flipped. (Between 
these times, the coin remains on the ground.) What is the 
probability that the coin is on its head side at time t? 


Hint: What would be the conditional probability if there 
were no additional flips by time ¢, and what would it be if 
there were additional flips by time rf? 


4.74. Consider a roulette wheel consisting of 38 numbers 1 
through 36, 0, and double 0. If Smith always bets that the 
outcome will be one of the numbers 1 through 12, what is 
the probability that 


(a) Smith will lose his first 5 bets; 
(b) his first win will occur on his fourth bet? 


4.75. Two athletic teams play a series of games; the first 
team to win 4 games is declared the overall winner. Sup- 
pose that one of the teams is stronger than the other and 
wins each game with probability .6, independently of the 
outcomes of the other games. Find the probability, for 
i = 4, 5, 6, 7, that the stronger team wins the series in 
exactly i games. Compare the probability that the stronger 
team wins with the probability that it would win a 2-out- 
of-3 series. 


4.76. Suppose in Problem 4.75 that the two teams are 
evenly matched and each has probability i of win- 
ning each game. Find the expected number of games 
played. 


4.77. An interviewer is given a list of people she can inter- 
view. If the interviewer needs to interview 5 people, and 
if each person (independently) agrees to be interviewed 
with probability z, what is the probability that her list of 
people will enable her to obtain her necessary number 
of interviews if the list consists of (a) 5 people and (b) 8 
people? For part (b), what is the probability that the inter- 
viewer will speak to exactly (c) 6 people and (d) 7 people 
on the list? 


4.78. During assembly, a product is equipped with 5 con- 
trol switches, each of which has probability .04 of being 
defective. What is the probability that 2 defective switches 
are encountered before 5 nondefective ones? 


4.79. Solve the Banach match problem (Example 8e) 
when the left-hand matchbox originally contained N; 
matches and the right-hand box contained Nz matches. 


4.80. In the Banach matchbox problem, find the probabil- 
ity that at the moment when the first box is emptied (as 
opposed to being found empty), the other box contains 
exactly k matches. 


4.81. An urn contains 4 red, 4 green, and 4 blue balls. We 
randomly choose 6 balls. If exactly two of them are red, 
we stop. Otherwise, we replace the balls in the urn and 
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randomly choose 6 balls again. What is the probability that 
we shall stop exactly after n selections? 


4.82. Suppose that a class of 50 students has appeared for 
a test. Forty-one students have passed this test while the 
remaining 9 students have failed. Find the probability that 
in a group of 10 students selected at random 


(a) none have failed the test; 
(b) at least 3 students have failed the test. 


4.83. A game popular in Nevada gambling casinos is Keno, 
which is played as follows: Twenty numbers are selected at 
random by the casino from the set of numbers 1 through 
80. A player can select from 1 to 15 numbers; a win occurs 
if some fraction of the player’s chosen subset matches any 
of the 20 numbers drawn by the house. The payoff is a 
function of the number of elements in the player’s selec- 
tion and the number of matches. For instance, if the player 
selects only 1 number, then he or she wins if this number is 
among the set of 20, and the payoff is $2.20 won for every 
dollar bet. (As the player’s probability of winning in this 
case is I, it is clear that the “fair” payoff should be $3 won 
for every $1 bet.) When the player selects 2 numbers, a 
payoff (of odds) of $12 won for every $1 bet is made when 
both numbers are among the 20. 


(a) What would be the fair payoff in this case? 

Let P,, x denote the probability that exactly k of the n 
numbers chosen by the player are among the 20 selected 
by the house. 

(b) Compute P,, x 

(c) The most typical wager at Keno consists of selecting 10 
numbers. For such a bet, the casino pays off as shown in 
the following table. Compute the expected payoff: 


Keno Payoffs in 10 Number Bets 


Number of matches Dollars won for each $1 bet 


E 
I 


DOMIDUN 
= 
N 
Ne) 
Ne) 


= 


Theoretical Exercises 


4.1. There are N distinct types of coupons, and each time 
one is obtained it will, independently of past choices, be 
of type i with probability Pj,i = 1,...,N. Let T denote the 
number one need select to obtain at least one of each type. 
Compute P{T = n}. 


4.84. In Example 81, what percentage of i defective lots 
does the purchaser reject? Find it for i = 1,4. Given that 
a lot is rejected, what is the conditional probability that it 
contained 4 defective components? 


4.85. An automotive manufacturing company produces 
brake pads in lots of 100. This company inspects 15 brake 
pads from each lot and accepts the whole lot only if all 
15 brake pads pass the inspection test. Each brake pad 
is, independently of the others, faulty with probability .09. 
What proportion of the lots does the company reject? 


4.86. A neighborhood consists of five streets. Assume 
that the numbers of daily traffic accidents that occur on 
these streets are Poisson random variables with respective 
parameters .45, .2, .4, .5, and .35. What is the expected 
number of traffic accidents that will occur in this neigh- 
borhood next Monday? 


4.87. Suppose that a group of 15 female students is select- 
ing one shop out of the 6 available shops nearby to buy 
their prom dress. Each student, independently of the oth- 
ers, selects shop i with probability p;, where ey pi= i. 


(a) What is the expected number of shops that will not be 
visited by any student from this group? 

(b) What is the expected number of shops that will be vis- 
ited by exactly 3 students from this group? 


4.88. Martha makes a necklace by randomly selecting n 
beads from a large jar containing beads of k different col- 
ors. Independently of the selection of the previous bead, 
Martha selects a bead of color i with probability pj, where 
yy pi = 1. What is the expected number of different 
colored beads in the necklace? 


4.89. An urn contains 10 red, 8 black, and 7 green balls. 
One of the colors is chosen at random (meaning that the 
chosen color is equally likely to be any of the 3 colors), and 
then 4 balls are randomly chosen from the urn. Let X be 
the number of these balls that are of the chosen color. 


(a) Find P(X = 0). 

(b) Let X; equal 1 if the i” ball selected is of the cho- 
sen color, and let it equal 0 otherwise. Find P(X; = 1), 
i=1,2,3,4. 

(c) Find E[X]. 

Hint: Express X in terms of X1, X2, X3, X4. 


Hint: Use an argument similar to the one used in Exam- 
ple le. 


4.2. If X has distribution function /’, what is the distribu- 
tion function of e*? 


4.3. If X has distribution function F,, what is the distribu- 
tion function of the random variable aX + 6, where a and 
B are constants, a # 0? 


4.4. The random variable X is said to have the Yule- 
Simons distribution if 
4 


a ba 


P{X =n} = 


(a) Show that the preceding is actually a probability mass 
function. That is, show that )°"°., P(X =n} =1. 

(b) Show that ELX] = 2. 

(c) Show that ELX?] = oo. 
Hint: For (a), first use that 


1 1 
n(n + 1)(n+2) n(n+1) 
1 1 


then use that 


1 k = 
n(n+2)? nin+k)~ n n+k° 


4.5. Let N be a nonnegative integer-valued random vari- 
able. For nonnegative values a;,j = 1, show that 


Yi t+... + a) P(N =j} =) aiP{N = 3} 
j=l i=1 


Then show that 
E[N] = ) > PIN >i 
i=1 


and 


E[N(N + 1)]=2) iP{N = i} 
i=1 
4.6. The distribution of a random variable X, whose range 
is {—a, 0, a}, is given by 
P(X =-a}=pi, P(X =a}=pr 


Given that X has mean 0 and variance 1, work out the val- 
ues of p1, p2 in terms of a. 


4.7. Let X be a random variable with mean yu and variance 
o2. Prove that 


E[(X — y)?] = E[X*] - 307u - pw? 


4.8. Find the first three moments and the variance of a ran- 
dom variable X with distribution 


PIX =i} = a $=123.4 


4.9. Show how the derivation of the binomial probabilities 
n F é 
PX =i = ("ea — py", i=0,...,n 
i 


leads to a proof of the binomial theorem 
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n 
(x + yy” _ > (i) 
i=0 : 


when x and y are nonnegative. 
; . — «x 
Hint: Let p = ae 


4.10. Let X be a binomial random variable with parame- 
ters n and p. Show that: 


E[a — p)*]=( — p?)” 


4.11. Let X be the number of successes that result from 2n 
independent trials, when each trial is a success with prob- 
ability p. Show that P(X = n) is a decreasing function of 
n. 


4.12. A random walk S,, consists of sums of successive 
steps X;, each of which can be +1 with probability p for 
Xj; = 1 such that S, = )“_, X;. Show that (S, + n)/2 is 
binomially distributed and work out its mean and variance. 


4.13. There are n components lined up in a linear arrange- 
ment. Suppose that each component independently func- 
tions with probability p. What is the probability that no 2 
neighboring components are both nonfunctional? 


Hint: Condition on the number of defective components 
and use the results of Example 4c of Chapter 1. 


4.14. Let X be a binomial random variable with param- 
eters (n, p). What value of p maximizes P{X = k},k = 
0,1,...,2? This is an example of a statistical method used 
to estimate p when a binomial (n, p) random variable is 
observed to equal k. If we assume that n is known, then 
we estimate p by choosing that value of p that maximizes 
P{X = k}. This is known as the method of maximum like- 
lihood estimation. 


4.15. A family has n children with probability ap”,n = 1, 
where aw = (1 — p)/p. 

(a) What proportion of families has no children? 

(b) If each child is equally likely to be a boy or a girl 
(independently of each other), what proportion of families 
consists of k boys (and any number of girls)? 


4.16. Suppose that n independent tosses of a coin having 
probability p of coming up heads are made. Show that 
the probability that an even number of heads results is 
s(t + (q — p)"], where g = 1 — p. Do this by proving 
and then utilizing the identity 


[n/2] F 4 

= (3, ) ear =5le@ +9" + @- vp)" 

i= 

where [n/2] is the largest integer less than or equal to 


n/2. Compare this exercise with Theoretical Exercise 3.5 
of Chapter 3. 
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4.17. Let X be a Poisson random variable with parameter 
A. Show that P{X = i} increases monotonically and then 
decreases monotonically as i increases, reaching its maxi- 
mum when / is the largest integer not exceeding A. 


Hint: Consider P{X = i}/P{X =i — 1}. 


4.18. Let X be a Poisson random variable with parame- 
ter 2. 


(a) Show that 
P{X is even} = - [1 + e] 
2 


by using the result of Theoretical Exercise 4.15 and the 
relationship between Poisson and binomial random vari- 
ables. 

(b) Verify the formula in part (a) directly by making use of 
the expansion of e~* + e. 


4.19. Let X be a Poisson random variable with parameter 
A. What value of 7 maximizes P(X = k},k = 0? 


4.20. Show that X is a Poisson random variable with 
parameter A, then 


E[X"] =aE[(X + "4 
Now use this result to compute E[X°]. 


4.21. Consider n coins, each of which independently comes 
up heads with probability p. Suppose that n is large and p 
is small, and let 1 = np. Suppose that all coins are tossed; 
if at least one comes up heads, the experiment ends; if not, 
we again toss all n coins, and so on. That is, we stop the first 
time that at least one of the n coins come up heads. Let 
X denote the total number of heads that appear. Which 
of the following reasonings concerned with approximating 
P{X = 1} is correct (in all cases, Y is a Poisson random 
variable with parameter i)? 


(a) Because the total number of heads that occur when all 
n coins are rolled is approximately a Poisson random vari- 
able with parameter A, 


P(X =1} © P{Y =1}=rAe%* 


(b) Because the total number of heads that occur when all 
n coins are rolled is approximately a Poisson random vari- 
able with parameter A, and because we stop only when this 
number is positive, 


P{X=1} © P{Y=1|Y > O}= 


(c) Because at least one coin comes up heads, X will equal 
1 if none of the other 7 — 1 coins come up heads. Because 
the number of heads resulting from these n — 1 coins is 
approximately Poisson with mean (1 — 1)p © 4, 


P{X =1} ® P{Y =0}=e% 


4.22. From a set of n randomly chosen people, let Ej 
denote the event that persons i and j have the same birth- 
day. Assume that each person is equally likely to have any 
of the 365 days of the year as his or her birthday. Find 


(a) P(E3,4|F)2); 
(b) P(E, 3/E 12); 
(c) P(E23/ £12 9 £13). 


What can you conclude from your answers to parts (a)—(c) 
n 


> events Ej? 


about the independence of the ( 


4.23. An urn contains 2n balls, of which 2 are numbered 1, 
2 are numbered 2,..., and 2 are numbered n. Balls are suc- 
cessively withdrawn 2 at a time without replacement. Let 
T denote the first selection in which the balls withdrawn 
have the same number (and let it equal infinity if none of 
the pairs withdrawn has the same number). We want to 
show that, for0 < a < 1, 


lim P{T > an} =e */? 

n 
To verify the preceding formula, let M, denote the number 
of pairs withdrawn in the first k selections, k = 1,...,n. 


(a) Argue that when n is large, M;, can be regarded as the 
number of successes in k (approximately) independent tri- 
als. 

(b) Approximate P{M; = 0} when n is large. 

(c) Write the event {7 > an} in terms of the value of one 
of the variables Mx. 

(d) Verify the limiting probability given for P{T > an}. 


4.24. Consider a random collection of n individuals. In 
approximating the probability that no 3 of these individ- 
uals share the same birthday, a better Poisson approxima- 
tion than that obtained in the text (at least for values of n 
between 80 and 90) is obtained by letting E; be the event 
that there are at least 3 birthdays on day i,i = 1,...,365. 


(a) Find P(Ej). 
(b) Give an approximation for the probability that no 3 
individuals share the same birthday. 


(c) Evaluate the preceding when n = 88. (The exact prob- 
ability is derived in Example 1g of Chapter 6.) 


4.25. Here is another way to obtain a set of recursive equa- 
tions for determining P,, the probability that there is a 
string of k consecutive heads in a sequence of n flips of 
a fair coin that comes up heads with probability p: 


(a) Argue that for k < n, there will be a string of k con- 
secutive heads if either 


1. there is a string of k consecutive heads within the 
first n — 1 flips, or 

2. there is no string of k consecutive heads within the 
firstn — k — 1 flips, flip m — k is a tail, and flips 
n— k 4+ 1,...,n are all heads. 


(b) Using the preceding, relate P, to P,_;. Starting with 
Pe= p*, the recursion can be used to obtain Px.1, then 
Px+42, and so on, up to Py. 


4.26. Suppose that the number of events that occur in a 
specified time is a Poisson random variable with parameter 
A. If each event is counted with probability p, indepen- 
dently of every other event, show that the number of 
events that are counted is a Poisson random variable with 
parameter Ap. Also, give an intuitive argument as to why 
this should be so. As an application of the preceding result, 
suppose that the number of distinct uranium deposits in 
a given area is a Poisson random variable with param- 
eter 4 = 10. If, in a fixed period of time, each deposit 
is discovered independently with probability a find the 
probability that (a) exactly 1, (b) at least 1, and (c) at most 
1 deposit is discovered during that time. 


4.27. Prove 
n 1 fore) 


i 
yeaa 

i! n! 
i=0 7 


Hint: Use integration by parts. 


e *x"dx 


4.28. If X is a geometric random variable, show analyti- 
cally that 


PiX=n+k|\X > nh=PiX=h 


Using the interpretation of a geometric random variable, 
give a verbal argument as to why the preceding equation 
is true. 


4.29. Let X be a negative binomial random variable with 
parameters r and p, and let Y be a binomial random vari- 
able with parameters n and p. Show that 


P(X > n}=P{Y < 7} 


Hint: Either one could attempt an analytical proof of the 
preceding equation, which is equivalent to proving the 
identity 


© (21 )va-p=¥ (1) 


i=n+1 i=0 
x pil —_ py 


or one could attempt a proof that uses the probabilistic 
interpretation of these random variables. That is, in the 
latter case, start by considering a sequence of independent 
trials having a common probability p of success. Then try 
to express the events {X¥ > n} and {Y < r}in terms of the 
outcomes of this sequence. 


4.30. For a hypergeometric random variable, determine 


P{X=k+ I/P{xX=h 
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4.31. Balls numbered 1 through N are in an urn. Suppose 
that n,n = N, of them are randomly selected without 
replacement. Let Y denote the largest number selected. 


(a) Find the probability mass function of Y. 

(b) Derive an expression for E[Y] and then use Fer- 
mat’s combinatorial identity (see Theoretical Exercise 11 
of Chapter 1) to simplify the expression. 


4.32. A jar contains m + n chips, numbered 1,2,..., 
n + m. A set of size n is drawn. If we let X denote the 
number of chips drawn having numbers that exceed each 
of the numbers of those remaining, compute the probabil- 
ity mass function of X. 


4.33. A jar contains n chips. Suppose that a boy succes- 
sively draws a chip from the jar, each time replacing the 
one drawn before drawing another. The process continues 
until the boy draws a chip that he has previously drawn. 
Let X denote the number of draws, and compute its prob- 
ability mass function. 


4.34. Repeat Theoretical Exercise 4.33, this time assuming 
that withdrawn chips are not replaced before the next 
selection. 


4.35. From a set of n elements, a nonempty subset is cho- 
sen at random in the sense that all of the nonempty subsets 
are equally likely to be selected. Let X denote the number 
of elements in the chosen subset. Using the identities given 
in Theoretical Exercise 12 of Chapter 1, show that 


n 
ca) 
. 92n—2 _ n—2 
Var(X) = n-2 n(n + 1)2 
Qt =I? 
Show also that for n large, 
Var(X) + 4 
ar(X) » — 
4 


in the sense that the ratio Var(X) to n/4 approaches 1 as 
n approaches oo. Compare this formula with the limiting 
form of Var(Y) when P{Y = i} = 1/n,i=1,...,n. 


4.36. An urn initially contains one red and one blue ball. 
At each stage, a ball is randomly chosen and then replaced 
along with another of the same color. Let X denote the 
selection number of the first chosen ball that is blue. For 
instance, if the first selection is red and the second blue, 
then_X is equal to 2. 


(a) Find P{X > i},i = 1. 

(b) Show that with probability 1, a blue ball is eventually 
chosen. (That is, show that PLX < co} = 1.) 

(c) Find ELX]. 
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4.37. Suppose the possible values of X are {x;}, the possi- 
ble values of Y are {y;}, and the possible values of X + Y 
are {zx}. Let Ax denote the set of all pairs of indices (i, /) 
such that x; + yj = zx; that is, Ag = {(i,j) : x; + yj = Zx}. 
(a) Argue that 


PIX+Y=szut= D> P(X =xu,Y=y;} 
(i EAk 


(b) Show that 


E[X + Y]= >> YO Oi + yp) P(X =i, 


k (ij)EAK 
Y= yjs 


Self-Test Problems and Exercises 


4.1. Suppose that the random variable X is equal to the 
number of hits obtained by a certain baseball player in his 
next 3 at bats. If PLY = 1} = .3,P{X¥ = 2} = .2, and 
P{X = 0} = 3P{X = 3}, find ELX]. 


4.2. Suppose that X takes on one of the values 0, 1, and 2. 
If for some constant c, PLXY = i} = cP{X =i — 1},i=1,2, 
find EX]. 


4.3. A coin that when flipped comes up heads with prob- 
ability p is flipped until either heads or tails has occurred 
twice. Find the expected number of flips. 


4.4. A certain community is composed of m families, n; of 
: 
which have i children, }° nj = m. If one of the families is 


i=1 
randomly chosen, let X denote the number of children in 
r 
that family. If one of the >> in; children is randomly cho- 


i=1 
sen, let Y denote the total number of children in the family 
of that child. Show that E[Y] = E[X]. 


4.5. Suppose that P(X = 0} = 1 
3Var(X), find P(X = O}. 


PLX = 1). If E[X] = 


4.6. There are 2 coins in a bin. When one of them is flipped, 
it lands on heads with probability .6, and when the other is 
flipped, it lands on heads with probability .3. One of these 
coins is to be randomly chosen and then flipped. Without 
knowing which coin is chosen, you can bet any amount up 
to $10, and you then either win that amount if the coin 
comes up heads or lose it if it comes up tails. Suppose, how- 
ever, that an insider is willing to sell you, for an amount 
C, the information as to which coin was selected. What is 
your expected payoff if you buy this information? Note 
that if you buy it and then bet x, you will end up either 


(c) Using the formula from part (b), argue that 


ELX + Y] =) SC Gi + yp) P(X = 4, 
ij 
Y= yj} 


(d) Show that 


P(X =x) = DI P(X = 4x1, Y =), 
J 
PY aya) Pixaxu, Yay) 


(e) Prove that 


ELX + Y]=E[X] + E[Y] 


winning x — C or —x — C (that is, losing x + C in the lat- 
ter case). Also, for what values of C does it pay to purchase 
the information? 


4.7. A philanthropist writes a positive number x on a piece 
of red paper, shows the paper to an impartial observer, 
and then turns it face down on the table. The observer 
then flips a fair coin. If it shows heads, she writes the 
value 2x and, if tails, the value x/2, on a piece of blue 
paper, which she then turns face down on the table. With- 
out knowing either the value x or the result of the coin 
flip, you have the option of turning over either the red or 
the blue piece of paper. After doing so and observing the 
number written on that paper, you may elect to receive 
as a reward either that amount or the (unknown) amount 
written on the other piece of paper. For instance, if you 
elect to turn over the blue paper and observe the value 
100, then you can elect either to accept 100 as your reward 
or to take the amount (either 200 or 50) on the red paper. 
Suppose that you would like your expected reward to be 
large. 


(a) Argue that there is no reason to turn over the red 
paper first, because if you do so, then no matter what 
value you observe, it is always better to switch to the blue 
paper. 

(b) Let y be a fixed nonnegative value, and consider the 
following strategy: Turn over the blue paper, and if its 
value is at least y, then accept that amount. If it is less 
than y, then switch to the red paper. Let Ry(x) denote the 
reward obtained if the philanthropist writes the amount 
x and you employ this strategy. Find E[LRy(x)]. Note that 
E|Ro(x)] is the expected reward if the philanthropist writes 
the amount x when you employ the strategy of always 
choosing the blue paper. 


4.8. Let B(n, p) represent a binomial random variable with 
parameters n and p. Argue that 


P{B(n,p) < i} =1 — P{B(n,1 — p)<n—-—i- 


Hint: The number of successes less than or equal to i is 
equivalent to what statement about the number of fail- 
ures? 


4.9. If X is a binomial random variable with expected 
value 6 and variance 2.4, find P{X = 5}. 


4.10. An urn contains n balls numbered 1 through n. If 
you withdraw m balls randomly in sequence, each time 
replacing the ball selected previously, find P(X = k},k = 
1,...,m, where X is the maximum of the m chosen 
numbers. 


Hint: First find P(X S k}. 


4.11. Teams A and B play a series of games, with the first 
team to win 3 games being declared the winner of the 
series. Suppose that team A independently wins each game 
with probability p. Find the conditional probability that 
team A wins 


(a) the series given that it wins the first game; 
(b) the first game given that it wins the series. 


4.12. A local soccer team has 5 more games left to play. 
If it wins its game this weekend, then it will play its final 
4 games in the upper bracket of its league, and if it loses, 
then it will play its final games in the lower bracket. If it 
plays in the upper bracket, then it will independently win 
each of its games in this bracket with probability .4, and 
if it plays in the lower bracket, then it will independently 
win each of its games with probability .7 If the probability 
that the team wins its game this weekend is .5, what is the 
probability that it wins at least 3 of its final 4 games? 


4.13. Each of the members of a 7-judge panel indepen- 
dently makes a correct decision with probability .7. If the 
panel’s decision is made by majority rule, what is the prob- 
ability that the panel makes the correct decision? Given 
that 4 of the judges agreed, what is the probability that the 
panel made the correct decision? 


4.14. On average, 5.2 hurricanes hit a certain region in a 
year. What is the probability that there will be 3 or fewer 
hurricanes hitting this year? 


4.15. The number of eggs laid on a tree leaf by an insect 
of a certain type is a Poisson random variable with param- 
eter A. However, such a random variable can be observed 
only if it is positive, since if it is 0, then we cannot know 
that such an insect was on the leaf. If we let Y denote the 
observed number of eggs, then 


PLY =i} = P{X =i|X > 0} 
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where X is Poisson with parameter 4. Find E[Y]. 


4.16. Each of 1 boys and n girls, independently and ran- 
domly, chooses a member of the other sex. If a boy and 
girl choose each other, they become a couple. Number the 
girls, and let G; be the event that girl number / is part of a 
couple. Let Pp = 1 — P(U‘_, Gi) be the probability that no 
couples are formed. 


(a) What is P(G;)? 
(b) What is P(G;|G;)? 
(c) When z is large, approximate Pp. 


(d) When n is large, approximate P;, the probability that 
exactly k couples are formed. 


(e) Use the inclusion—exclusion identity to evaluate Po. 


4.17. A total of 2n people, consisting of n married couples, 
are randomly divided into n pairs. Arbitrarily number the 
women, and let W; denote the event that woman / is paired 
with her husband. 


(a) Find P(W)j). 

(b) For i 4 j, find P(W;|W)). 

(c) When n is large, approximate the probability that no 
wife is paired with her husband. 


(d) If each pairing must consist of a man and a woman, 
what does the problem reduce to? 


4.18. A casino patron will continue to make $5 bets on red 
in roulette until she has won 4 of these bets. 


(a) What is the probability that she places a total of 9 
bets? 

(b) What are her expected winnings when she stops? 
Remark: On each bet, she will either win $5 with probabil- 
ity 8 or lose $5 with probability 20 


4.19. When three friends go for coffee, they decide who 
will pay the check by each flipping a coin and then letting 
the “odd person” pay. If all three flips produce the same 
result (so that there is no odd person), then they make a 
second round of flips, and they continue to do so until there 
is an odd person. What is the probability that 


(a) exactly 3 rounds of flips are made? 
(b) more than 4 rounds are needed? 


4.20. Show that if X is a geometric random variable with 
parameter p, then 


—p log(p) 


E{1/X] = — ; 


Hint: You will need to evaluate an expression of the form 
> a'/i. To do so, write a'/i = ih x'-ldx, and then inter- 


i=1 
change the sum and the integral. 
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4.21. Suppose that 
P{X =a}=p, P{X =b}=1-p 


(a) Show that xp is a Bernoulli random variable. 
(b) Find Var(X). 


4.22. Each game you play is a win with probability p. You 
plan to play 5 games, but if you win the fifth game, then 
you will keep on playing until you lose. 

(a) Find the expected number of games that you play. 

(b) Find the expected number of games that you lose. 


4.23. Balls are randomly withdrawn, one at a time without 
replacement, from an urn that initially has N white and 
M black balls. Find the probability that n white balls are 
drawn before m black balls,n = N,m S M. 


4.24. Ten balls are to be distributed among 5 urns, with 
each ball going into urn i with probability p;, ae pi=1. 
Let X; denote the number of balls that go into urn i. 
Assume that events corresponding to the locations of dif- 
ferent balls are independent. 

(a) What type of random variable is X;? Be as specific as 
possible. 

(b) For i # j, what type of random variable is X; + Xj? 
(c) Find P{X, + X2 + X3 = 7}. 

4.25. For the match problem (Example 5m in Chapter 2), 
find 

(a) the expected number of matches. 

(b) the variance of the number of matches. 


4.26. Let a be the probability that a geometric random 
variable X with parameter p is an even number. 


(a) Find by using the identity a = °°, P{X =2i}. 
(b) Find a by conditioning on whether X¥ = 1 or X > 1. 


4.27. Two teams will play a series of games, with the winner 
being the first team to win a total of 4 games. Suppose that, 
independently of earlier results, team 1 wins each game it 
plays with probability p,0 < p < 1. Let N denote the 
number of games that are played. 


(a) Show that P(N = 6) = P(N = 7) with equality only 
when p = 1/2. 
(b) Give an intuitive explanation for why equality results 
when p = 1/2. 


Hint: Consider what needs to be true in order for the num- 
ber of games to be either 6 or 7. 

(c) If p = 1/2, find the probability that the team that wins 
the first game wins the series. 


4.28. An urn has n white and m black balls. Balls are ran- 
domly withdrawn, without replacement, until a total of 
k,k = n white balls have been withdrawn. The random 
variable X equal to the total number of balls that are 
withdrawn is said to be a negative hypergeometric random 
variable. 


(a) Explain how such a random variable differs from a neg- 
ative binomial random variable. 
(b) Find P{X = +}. 


Hint for (b): In order for X = r to happen, what must be 
the results of the first 7 — 1 withdrawals? 


4.29. There are 3 coins which when flipped come up heads, 
respectively, with probabilities 1/3, 1/2, 3/4. One of these 
coins is randomly chosen and continually flipped. 


(a) Find the probability that there are a total of 5 heads in 
the first 8 flips. 


(b) Find the probability that the first head occurs on flip 5. 


4.30. If X is a binomial random variable with parameters 
n and p, what type of random variable ism — X. 


4.31. Let X be the i" smallest number in a random sam- 
ple of 1 of the numbers 1,...,” + m. Find the probability 
mass function of X. 


4.32. Balls are randomly removed from an urn consisting 
of n red and m blue balls. Let XY denote the number of balls 
that have to be removed until a total of r red balls have 
been removed. X is said to be a negative hypergeometric 
random variable. 


(a) Find the probability mass function of X. 

(b) Find the probability mass function of V, equal to the 
number of balls that have to be removed until either r red 
balls or s blue balls have been removed. 

(c) Find the probability mass function of Z, equal to the 
number of balls that have to be removed until both at least 
r red balls and at least s blue balls have been removed. 


(d) Find the probability that r red balls are removed before 
s blue balls have been removed. 
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5.3 The Uniform Random Variable of a Random Variable 


5.4 Normal Random Variables 


5.1 Introduction 


In Chapter 4, we considered discrete random variables—that is, random variables 
whose set of possible values is either finite or countably infinite. However, there also 
exist random variables whose set of possible values is uncountable. Two examples 
are the time that a train arrives at a specified stop and the lifetime of a transistor. 
Let X be such a random variable. We say that X is a continuous‘ random variable 
if there exists a nonnegative function f, defined for all real x € (—oo, co), having the 
property that for any set B of real numbers, * 


P{X € B} = | fo dx (1.1) 
B 


The function f is called the probability density function of the random variable X. 
(See Figure 5.1.) 

In words, Equation (1.1) states that the probability that X will be in B may be 
obtained by integrating the probability density function over the set B. Since X must 
assume some value, f must satisfy 


1 = P{X € (—co,00)} = [ f(x) dx 


All probability statements about X can be answered in terms of f. For instance, from 
Equation (1.1), letting B = [a, b], we obtain 


b 
Plaz= X = b} -|/ f(x) dx (1.2) 


+ Sometimes called absolutely continuous. 


¥ Actually, for technical reasons, Equation (1.1) is true only for the measurable sets B, which, fortunately, include 
all sets of practical interest. 
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Example 
la 


Example 
Ib 


a b 
P(a = X Sb) = area of shaded region 


Figure 5.1 Probability density function f. 


If we let a = b in Equation (1.2), we get 


PX =a)= f fiyde=0 


In words, this equation states that the probability that a continuous random variable 
will assume any fixed value is zero. Hence, for a continuous random variable, 


P{X < a =P{X <a)=Fa=[ f(x) dx 


Suppose that X is a continuous random variable whose probability density function 
is given by 

Che = 2) Dex 22 

0 otherwise 


roro| 
(a) What is the value of C? 
(b) Find P{X > 1}. 


Solution (a) Since f is a probability density function, we must have [°. f(x) dx = 1, 
implying that 


2 
e} (4x — 2x*)dx =1 
0 


or 


2 3 x=2 
2 = 
Cc [> 3 =1 
x=0 

or 3 

C = 8 
Hence, 
Gxt se i=(— fades 2 ae — 2x?) dx = 5 a 


The amount of time in hours that a computer functions before breaking down is a 
continuous random variable with probability density function given by 


re */100 x=>0 


f=} 4 


baie |) 


Example 
Ic 
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What is the probability that 


(a) acomputer will function between 50 and 150 hours before breaking down? 
(b) it will function for fewer than 100 hours? 


Solution (a) Since 


iy fx) dx = 2 f e—*/100 Gy 
—0oo 0 


we obtain 
1 = —A(100)e7*/10/ = 1002 A= 
(100)e la or Ti 
Hence, the probability that a computer will function between 50 and 150 hours 
before breaking down is given by 


150 


1 150 
e7*/100 yy — —p-*/100 


P50 < X < 150) = [ 
50 


eT a ee le 


(b) Similarly, 


100 


1 
P{X < 100} = / —¢#/100 dy — —9#/100/000 1 _ el w 632 
0 


In other words, approximately 63.2 percent of the time, a computer will fail before 
registering 100 hours of use. a 


The lifetime in hours of a certain kind of radio tube is a random variable having a 
probability density function given by 


0 x = 100 


x)= 4 100 
PO) —, x > 100 
x 


What is the probability that exactly 2 of 5 such tubes in a radio set will have to 
be replaced within the first 150 hours of operation? Assume that the events Fj,i = 
1,2,3,4,5, that the ith such tube will have to be replaced within this time are 
independent. 


Solution From the statement of the problem, we have 


Hence, from the independence of the events £;, it follows that the desired probabil- 


ity is 
2 3 
5\/1 2 _ 80 - 
2] \3 3 243 
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Id 


The relationship between the cumulative distribution F and the probability den- 
sity f is expressed by 


a 
F(a) = P{X € (-o, a]} = / f(x) dx 
=O 
Differentiating both sides of the preceding equation yields 


< F@) = f(a) 


That is, the density is the derivative of the cumulative distribution function. A some- 
what more intuitive interpretation of the density function may be obtained from 
Equation (1.2) as follows: 


r . a+e/2 
Pla-==X<a¢+- =i f(x) dx © ef (a) 
2 2 a—e/2 


when ¢ is small and when f(-) is continuous at x = a. In other words, the probability 
that X will be contained in an interval of length e around the point a is approximately 
ef(a). From this result, we see that f(a) is a measure of how likely it is that the 
random variable will be near a. 


If X is continuous with distribution function Fy and density function fy, find the 
density function of Y = 2X. 


Solution We will determine fy in two ways. The first way is to derive, and then dif- 
ferentiate, the distribution function of Y: 


Fy(a) = P{Y = a} 
= P{2X =< a} 
= P{X S a/2} 
= Fx(a/2) 


Differentiation gives 
1 
fyr@= 5fx(a/2) 


Another way to determine fy is to note that 


Re sfx(a/2) 


Dividing through by «€ gives the same result as before. a 
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5.2 Expectation and Variance of Continuous Random Variables 
In Chapter 4, we defined the expected value of a discrete random variable X by 
E[X] = > xP{X =x} 
If X is a continuous random variable having probability density function f(x), then, 


because 
f(x)dx » P{x = X =x + dx} for dx small 


it is easy to see that the analogous definition is to define the expected value of X by 


E|X] = a x f(x) dx 


Example Find E[X] when the density function of X is 


2a 
2x if O=x<1 


0 otherwise 


roe | 
Solution 


E[X] = [to dx 


1 
= / 2x? dx 
0 


2 
3 


Example The density function of X is given by 


2b 
lif Osx<1 


0 otherwise 


roro| 


Find E[e*]. 


Solution Let Y = e*. We start by determining Fy, the cumulative distribution func- 
tion of Y. Now, forl = x <e, 


Fy(x) = P{Y = x} 
= P{eX = x} 
= P{X S log(x)} 


log(x) 
- i) f(y) dy 


= log(x) 


By differentiating Fy(x), we can conclude that the probability density function of Y 
is given by 


1 
fyrm=- l1s=x=e 
x 


Hence, 
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Proposition 
2.1 


Lemma 
2.1 


Fe*|=eIY|= [ xfrenas 
=[ dx 
1 
=e — 1 f@ 


Although the method employed in Example 2b to compute the expected value 
of a function of X is always applicable, there is, as in the discrete case, an alternative 
way of proceeding. The following is a direct analog of Proposition 4.1 of Chapter 4. 


If X is a continuous random variable with probability density function f(x), then, for 
any real-valued function g, 


Elg(x)] =) 8(x) fx) dx 


An application of Proposition 2.1 to Example 2b yields 


1 
E[e*] =i e* dx since f(x) =1, O<x<1 
0 


=e-1 


which is in accord with the result obtained in that example. 

The proof of Proposition 2.1 is more involved than that of its discrete random 
variable analog. We will present such a proof under the provision that the random 
variable g(X) is nonnegative. (The general proof, which follows the argument in the 
case we present, is indicated in Theoretical Exercises 5.2 and 5.3.) We will need the 
following lemma, which is of independent interest. 


For a nonnegative random variable Y, 
CO 
E[Y] =) P{Y > y}dy 
0 


Proof We present a proof when Y is a continuous random variable with probability 
density function fy. We have 


/ PY > yhdy= | i, f(x) dx dy 
0 0 y 


where we have used the fact that P{[Y > y} = i, fy (x) dx. Interchanging the order 
of integration in the preceding equation yields — 


i PY > yhdy= | (/ ay) fits) 
0 0 0 


= [ste dx 
= E[Y] 
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Proof of Proposition 2.1 From Lemma 2.1, for any function g for which g(x) = 0, 


Elg(x)] =f P{g(X) > y}dy 


=f [ feraray 
0 xig(x)>y 
&(x) 
— / i, dy f (x) dx 
xig(x)>0 J0 


= / g(x) f(x) dx 
xig(x)>0 


which completes the proof. 


Example A stick of length 1 is split at a point U having density function f(u) = 1,0 < u < 1. 
2c Determine the expected length of the piece that contains the point p,0 = p = 1. 


Solution Let L,(U) denote the length of the substick that contains the point p, and 
note that 


1-U U <p 
t= { ap 


(See Figure 5.2.) Hence, from Proposition 2.1, 


if 
E[Lp(U)] -| Lp(u) du 


Pp 1 
=) (1 — u)du + / udu 
0 D 


1 1—p) 1 2 
= ( P) 4 P 
2 2 2 2 
1 
=- i 
7 + Pt P) 
«< 1 =: > 
a eS. 
0 U Dp i 
«< U- > 
a 
0 Pp U 7 ©) 


Figure 5.2 Substick containing point p: (a) U < p;(b) U > p. 


Since p(1 — p) is maximized when p = 5 it is interesting to note that the expected 
length of the substick containing the point p is maximized when p is the midpoint of 
the original stick. a 


Example Suppose that if you are s minutes early for an appointment, then you incur the cost 
2d cs, and if you are s minutes late, then you incur the cost ks. Suppose also that the 
travel time from where you presently are to the location of your appointment is a 
continuous random variable having probability density function f. Determine the 

time at which you should depart if you want to minimize your expected cost. 
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Corollary 
2.1 


Example 
2e 


Solution Let X denote the travel time. If you leave t minutes before your appoint- 
ment, then your cost—call it C;(X) —is given by 


_jet—- xX) ifx st 
c= | ee —pnifxX=t 


Therefore, 
PICU) = [Caen fo as 
= [ « = erie i: k(x — Of (x) dx 
=et [foods _ cf xfrds # kf xf ax 7 ke [fsa 


The value of f that minimizes E[C;(X)] can now be obtained by calculus. Differenti- 
ation yields 


d 
eli = tfO + FO — af) — kf + ktfO — kil — FO) 
=(k + c)F(t) —k 


Equating the rightmost side to zero shows that the minimal expected cost is obtained 
when you leave ¢* minutes before your appointment, where ¢* satisfies 


k 
F(t‘) = ——— a 
(t") Eas 


As in Chapter 4, we can use Proposition 2.1 to show the following. 

If a and b are constants, then 
ElaX + b] =aE[X] + b 

The proof of Corollary 2.1 for a continuous random variable X is the same as 
the one given for a discrete random variable. The only modification is that the sum 
is replaced by an integral and the probability mass function by a probability density 
function. 

The variance of a continuous random variable is defined exactly as it is for a 


discrete random variable, namely, if X is a random variable with expected value p, 
then the variance of X is defined (for any type of random variable) by 


Var(X) = E[(X — y)’] 
The alternative formula, 
Var(X) = E[X?] — (E[X])* 
is established in a manner similar to its counterpart in the discrete case. 


Find Var(X) for X as given in Example 2a. 


0 
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Solution We first compute E[X7]. 


E[X?] = [ x? f (x) dx 


—0oo 
1 
-|/ 2x> dx 
0 
_ 
“2 
Hence, since ELX] = z, we obtain 
ae 1 Pd 7 
cat ae ae 


It can be shown that, for constants a and b, 
Var(aX + b) = a°Var(X) 


The proof mimics the one given for discrete random variables. 

There are several important classes of continuous random variables that appear 
frequently in applications of probability; the next few sections are devoted to a study 
of some of them. 


The Uniform Random Variable 


A random variable is said to be uniformly distributed over the interval (0, 1) if its 
probability density function is given by 


1 O0O<x<1 


0 otherwise (3.1) 


rare | 


Note that Equation (3.1) is a density function, since f(x) = 0 and fais fQx)dx = 
ie dx = 1. Because f(x) > 0 only when x € (0,1), it follows that X must assume a 
value in interval (0, 1). Also, since f(x) is constant for x € (0,1), X is just as likely to 
be near any value in (0, 1) as it is to be near any other value. To verify this statement, 
note that for any0 <a <b <1, 


b 
Plas XxX sb)=| f@dx=b-—-a 


In other words, the probability that X is in any particular subinterval of (0, 1) equals 
the length of that subinterval. 

In general, we say that X is a uniform random variable on the interval (a, £) if 
the probability density function of X is given by 


1 : 
Oe ee eke (3.2) 


0 otherwise 
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Example 
3a 


f(a) F(a) 


(a) (b) 


Figure 5.3 Graph of (a) f(a) and (b) F(a) for a uniform (a, 6) random variable. 


Since F(a) = an f(x) dx, it follows from Equation (3.2) that the distribution func- 
tion of a uniform random variable on the interval (a, 8) is given by 


0 asa 
a-—a 

F(a = a a<a< Bp 
1 a= Bp 


Figure 5.3 presents a graph of f(a) and F(a). 
Let X be uniformly distributed over (a, 6). Find (a) ELX] and (b) Var(X). 


Solution (a) 


E[X] = [ x f(x) dx 


B 
a / “ax 
a B — a 
p2 = a2 
2B — a) 
ae 
= 2 
In words, the expected value of a random variable that is uniformly distributed 


over some interval is equal to the midpoint of that interval. 
(b) To find Var(X), we first calculate ELX?]. 
Po 
E[X?] = / ar dx 
a B — a 
B? = a 
3B — @) 
7 p2 4. ap ot a2 
= 


Hence, 
2 2 2 
age se 


_ @- ay 


12 


Example 
3b 


Example 
3c 


Example 
3d 
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Therefore, the variance of a random variable that is uniformly distributed over 
some interval is the square of the length of that interval divided by 12. a 


If X is uniformly distributed over (0, 10), calculate the probability that (a) X < 3, 
(b) X > 6,and(c)3 < X < 8. 


ame 3 
Solution (a) P{X < 3} : 10 dx Ti 
mt 4 
(b) P{X > 6} [ Ti dx io 
at 1 
(c) PB <x <8)=/ —dx=-= a 
3 10 2 


Buses arrive at a specified stop at 15-minute intervals starting at 7 A.M. That is, they 
arrive at 7, 7:15, 7:30, 7:45, and so on. If a passenger arrives at the stop at a time that 
is uniformly distributed between 7 and 7:30, find the probability that he waits 


(a) less than 5 minutes for a bus; 
(b) more than 10 minutes for a bus. 


Solution Let X denote the number of minutes past 7 that the passenger arrives at 
the stop. Since X is a uniform random variable over the interval (0, 30), it follows 
that the passenger will have to wait less than 5 minutes if (and only if) he arrives 
between 7:10 and 7:15 or between 7:25 and 7:30. Hence, the desired probability for 
part (a) is 

30 


eae 1 1 
P{l0 < X < 15 P{25 < X < 30}= —d —dx= = 
{10 < < 15} + P{25 < <= f art farms 
Similarly, he would have to wait more than 10 minutes if he arrives between 7 and 


7:05 or between 7:15 and 7:20, so the probability for part (b) is 
1 
Ps & <a) PlS = A a oO 


The next example was first considered by the French mathematician Joseph 
L. F Bertrand in 1889 and is often referred to as Bertrand’s paradox. It represents 
our initial introduction to a subject commonly referred to as geometrical probability. 


Consider a random chord of a circle. What is the probability that the length of the 
chord will be greater than the side of the equilateral triangle inscribed in that circle? 


Solution As stated, the problem is incapable of solution because it is not clear what 
is meant by a random chord. To give meaning to this phrase, we shall reformulate 
the problem in two distinct ways. 

The first formulation is as follows: The position of the chord can be determined 
by its distance from the center of the circle. This distance can vary between 0 and 
r, the radius of the circle. Now, the length of the chord will be greater than the side 
of the equilateral triangle inscribed in the circle if the distance from the chord to 
the center of the circle is less than 7/2. Hence, by assuming that a random chord 
is a chord whose distance D from the center of the circle is uniformly distributed 
between 0 and r, we see that the probability that the length of the chord is greater 
than the side of an inscribed equilateral triangle is 

P {p < ;| a 


r 2 
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Figure 5.4 


For our second formulation of the problem, consider an arbitrary chord of the 
circle; through one end of the chord, draw a tangent. The angle 6 between the chord 
and the tangent, which can vary from 0° to 180°, determines the position of the chord. 
(See Figure 5.4.) Furthermore, the length of the chord will be greater than the side 
of the inscribed equilateral triangle if the angle 6 is between 60° and 120°. Hence, 
assuming that a random chord is a chord whose angle @ is uniformly distributed 
between 0° and 180°, we see that the desired answer in this formulation is 

P{60 < 6 < Die ee 

180 3 
Note that random experiments could be performed in such a way that 5 or i would 
be the correct probability. For instance, if a circular disk of radius r is thrown on a 
table ruled with parallel lines a distance 2r apart, then one and only one of these 
lines would cross the disk and form a chord. All distances from this chord to the 
center of the disk would be equally likely, so that the desired probability that the 
chord’s length will be greater than the side of an inscribed equilateral triangle is 5 
In contrast, if the experiment consisted of rotating a needle freely about a point A 
on the edge (see Figure 5.4) of the circle, the desired answer would be }. | 


5.4 Normal Random Variables 


We say that X is a normal random variable, or simply that X is normally distributed, 
with parameters jz and o? if the density of X is given by 


= = —(x—p)* /20? = 
f(x) ino e oo <x <0 
This density function is a bell-shaped curve that is symmetric about p. (See 
Figure 5.5.) 

The normal distribution was introduced by the French mathematician Abraham 
DeMoivre in 1733, who used it to approximate probabilities associated with bino- 
mial random variables when the binomial parameter n is large. This result was later 
extended by Laplace and others and is now encompassed in a probability theorem 
known as the central limit theorem, which is discussed in Chapter 8. The central limit 
theorem, one of the two most important results in probability theory,’ gives a theo- 
retical base to the often noted empirical observation that, in practice, many random 
phenomena obey, at least approximately, a normal probability distribution. Some 


+ The other is the strong law of large numbers. 
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399 


399 


Figure 5.5 Normal density function: (a) 4 = 0,0 = 1; (b) arbitrary p,0. 


examples of random phenomena obeying this behavior are the height of a man or 
woman, the velocity in any direction of a molecule in gas, and the error made in 
measuring a physical quantity. 

To prove that f(x) is indeed a probability density function, we need to show that 


1 [o-¢) 
J | eH) 20° dy — J 
210 —oo 


Making the substitution y = (x — j)/o, we see that 


! a —(x—)?/20? 4 1 ‘B —y?/2 gq 
e x= — e> 
V210 J—co V2m J—co ” 


Hence, we must show that 


/ ey 2 dy = V2n 


—ooO 


Toward this end, let 7 = [° -y’/2 dy. Then 


oo © 


2 = -y*/2 7 =x? /2 
P= e dy e dx 


CO —Co 

co foo a ; 
aif i e WT /2 dy dx 

—Co ¥ —0O 


We now evaluate the double integral by means of a change of variables to polar 
coordinates. (That is, let x = rcos0,y = rsin@, and dy dx = rdé dr.) Thus, 
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oo p2a ‘ 
Peo i / e /?r dé dr 
0 0 
[o<) 


= 2x f rev’ !2 dr 
0 


_ —r? /2|o0 
= —27e Ps 


=2n 


Hence, J = /2z, and the result is proved. 


An important fact about normal random variables is that if X is normally dis- 
tributed with parameters and o”, then Y = aX + bis normally distributed with 
parameters au + b and a*o”. To prove this statement, suppose that a > 0. (The 
proof when a < Ois similar.) Let Fy denote the cumulative distribution function of 


Y. Then 


Fy(x) = P{Y = x} 
=P{aX + bs x} 


-p{x === >| 
a 


a ae 


where Fy is the cumulative distribution function of X’.. By differentiation, the density 


function of Y is then 


fro) = 7 ix(* - ”) 


a 


= oo [-(S2 a) pal 


ae 5 oxi (x — b — apm)*/2(ao)*} 


J 20a 


which shows that Y is normal with parameters au + b and a*o?. 

An important implication of the preceding result is that if X is normally dis- 
tributed with parameters yz and o7, then Z = (X — w)/o is normally distributed 
with parameters 0 and 1. Such a random variable is said to be a standard, or a unit, 


normal random variable. 


We now show that the parameters jz and o” of a normal random variable repre- 


sent, respectively, its expected value and variance. 


Example Find E[X] and Var(X) when X is a normal random variable with parameters 


4a and o?. 
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Solution Let us start by finding the mean and variance of the standard normal ran- 
dom variable Z = (X — j)/o. We have 


E[Z] = i x fz(x) dx 


Thus, 
Var(Z) = E[Z?] 
1 / “2 —x?/2 
= — xe * /* dx 
V 2m Joo 
Integration by parts (with u = x and dv = xe~*’/2) now gives 
1 2 ae) 
Var(Z) = —— { —xe7* /2) + / er i dy 
V2 ( [oo ba 
1 case. 
= —x°/2 
= e x 
V2 [. 
= 
Because X¥ = uw + oZ, the preceding yields the results 
E[X] =n + oF[Z] =n 


and 
Var(X) = 0?Var(Z) = 07 | 


It is customary to denote the cumulative distribution function of a standard nor- 
mal random variable by ®(x). That is, 
1 * 2 
P(x) = — | eld 
Vv 2 J—oo . 
The values of ®(x) for nonnegative x are given in Table 5.1. For negative values of x, 
® (x) can be obtained from the relationship 


@(—x) = 1 — (x) — oo <x <o (4.1) 


The proof of Equation (4.1), which follows from the symmetry of the standard nor- 
mal density, is left as an exercise. This equation states that if Z is a standard normal 
random variable, then 


P{Z = —x}= P{Z > x} —-wo<x<ow 


Since Z = (X — y)/o is a standard normal random variable whenever X is normally 
distributed with parameters jz and o”, it follows that the distribution function of X 
can be expressed as 
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Table 5.1 Area ®(x) Under the Standard Normal Curve to the Left of X. 


xX .00 O01 .02 .03 .04 AO) .06 07 .08 .09 


5000 = =.5040) =.5080) =.5120) 5160) )3=—.5199) 5239) 5279 5319 5359 
5398 = .5438 5478 = 5517) 5557) 5596 5636) = 567557145753 
5793, 5832) 5871) = 59105948 .5987 Ss .6026-—S 6064 —s6103—s 6141 
6179 = .6217, = 6255S .6293—s 6331 = 6368 ~— 6406 = 6443S 6480 6517 
6554 = .6591 = .6628 = 6664. — 6700 = 6736S .6772_~— 6808 ~—.6844_— 6879 
6915 .6950 .6985 = .7019— 7054 £7088) .7123) 71577190 £7224 
7257) = 7291 7324 7357 £7389 74227454 £7486 .7517 ~~ 7549 
7580 = 7611 £7642) £7673 £7704 77347764 779478237852 
.7881 = .7910 79397967 £7995 .8023 8051 = 8078 ~—.8106 ~— 8133 
9 8159 8186 8212 = .8238 = 8264 — 828983158340 )~— 83658389 
10 6.8413) 843884618485 8508) 8531) 8554. 8577 8599 8621 
11.8643 =.8665 8686 = 8708 ~— 872987498770 ~—.8790~—.8810 8830 
12 8849 8869 8888 .8907 .8925 .8944 8962 .8980 .8997 = .9015 
13.9032 =.9049. 9066) .9082, .9099) 9115, 9131) 9147) .9162—— 9177 
14  .9192 = .9207) = .9222  .9236 9251 9265S .9279 = 9292 .9306 = .9319 
15.9332, 9345. 93579370 9382s 9394 9406 = 941894299441 
16.9452 = .9463 9474 94849495) 9505) 95159525, .9535 9545 
17.9554 = -.9564 9573, .9582 9591-9599 9608) = 9616) 9625 .9633 
18 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 
19 9713, 9719 9726-9732, 9738 97449750) 97569761 9767 
2.0 .9772 .9778 .9783 .9788 .9793 .9798  .9803 9808 = .9812—.9817 
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854  .9857 
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887  .9890 
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913  .9916 
24 .9918 .9920 .9922 .9925  .9927 .9929 .9931 .9932 .9934 .9936 
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 9951  .9952 
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964 
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973  .9974 
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980  .9981 
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986 
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990  .9990 
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993  .9993 
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995  .9995 
3.3. .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997 
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997  .9998 


CNUYNDNWKRWNHRS 


Fr) = PX sa) =P(~—# = S—#) 9 (4—*) 
(ox 


Oo oO 
Example If X is a normal random variable with parameters ~ = 3 and o* = 9, find 
4b (a) P{2 < X < 5}; (b) P{X > 0}; (c) P{|X — 3] > 6}. 
Solution (a) 
PQ <X <5)=P 2-3 xX-3 5-3 
{2 < <5} = = 3 <3 


Example 
4c 
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(b) 
PIX > o=P|=S ae . =| = Piz S=1} 
=1 — 6(-1) 
= (1) 
~ 8413 
(c) 


P{\X — 3| > 6} = P(X > 9} + PIX < -3} 
=»{X=3 : 23h p[X es : 353) 
= PZ = 2) + Piz < 223 
=f GO) ck G9) 
=9f1 — 6@)] 
~ 0456 a 


An examination is frequently regarded as being good (in the sense of determining 
a valid grade spread for those taking it) if the test scores of those taking the exami- 
nation can be approximated by a normal density function. (In other words, a graph 
of the frequency of grade scores should have approximately the bell-shaped form of 
the normal density.) The instructor often uses the test scores to estimate the normal 
parameters jz and o? and then assigns the letter grade A to those whose test score 
is greater than « + o,B to those whose score is between yp and w + o, C to those 
whose score is between 4 — o and yp, D to those whose score is between uw — 20 
and yw — o, and F to those getting a score below yw — 2c. (This strategy is sometimes 
referred to as grading “on the curve.”) Since 


P{X¥ >wt+oajy=P 


xX — 

e » 1} =1 = @ay ~ 1587 
oO 

X — pb 


Plu <X <n +o)=P{0 < <1} = 04) = @) ~ 3413 


oO 


xX — 
Plo <X <p)=P|-1 < Hol 
oO 


= 6(0) — &(-1) © 3413 


x = 
Plu 20 <X <4 0) =P{-2 < ae i} 
Oo 


= 0(2) — @(1) © 1359 
— 


x 
P{X <p — 20} =| < -2| = &(—2) © .0228 
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Example 
4d 


Example 
4e 


Example 
4f 


it follows that approximately 16 percent of the class will receive an A grade on the 
examination, 34 percent a B grade, 34 percent a C grade, and 14 percent a D grade; 
2 percent will fail. | 


An expert witness in a paternity suit testifies that the length (in days) of human 
gestation is approximately normally distributed with parameters jz = 270 and o* = 
100. The defendant in the suit is able to prove that he was out of the country during 
a period that began 290 days before the birth of the child and ended 240 days before 
the birth. If the defendant was, in fact, the father of the child, what is the probability 
that the mother could have had the very long or very short gestation indicated by 
the testimony? 


Solution Let X denote the length of the gestation, and assume that the defendant 
is the father. Then the probability that the birth could occur within the indicated 
period is 


P{X > 290 or X < 240} = P{X > 290} + PLX < 240} 


ape oN 254 pie eg 
10 10 
=1 = @@) +1 = 6G) 
~ 0241 tal 


Suppose that a binary message—either 0 or 1—must be transmitted by wire from 
location A to location B. However, the data sent over the wire are subject to a chan- 
nel noise disturbance, so, to reduce the possibility of error, the value 2 is sent over 
the wire when the message is 1 and the value —2 is sent when the message is 0. If 
x,X = +2, is the value sent at location A, then R, the value received at location B, is 
given by R= x + N, where N is the channel noise disturbance. When the message 
is received at location B, the receiver decodes it according to the following rule: 


If R = .5, then 1 is concluded. 
IfR < .5, then 0 is concluded. 


Because the channel noise is often normally distributed, we will determine the error 
probabilities when N is a standard normal random variable. 

Two types of errors can occur: One is that the message 1 can be incorrectly deter- 
mined to be 0, and the other is that 0 can be incorrectly determined to be 1. The first 
type of error will occur if the message is 1 and2 + N < .5, whereas the second will 
occur if the message is 0 and —2 + N = .5. Hence, 


P{error|message is 1} = P{N < —1.5} 
=1— (1.5) & .0668 
and 


P{error|message is 0} = P{N = 2.5} 
=1 — (25) = .0062 Py 


Value at Risk (VAR) has become a key concept in financial calculations. The VAR of 
an investment is defined as that value v such that there is only a 1 percent chance that 
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the loss from the investment will be greater than v. If_X, the gain from an investment, 
is a normal random variable with mean jz and variance o”, then because the loss is 
equal to the negative of the gain, the VAR of such an investment is that value v such 
that 

.01 = P{-X > v} 


2 


Using that —X is normal with mean —y and variance o~, we see that 


|=: aa 
> 
Oo 


.01 =P 


=1-0(+4) 
o 


Because, as indicated by Table 5.1, ®(2.33) = .99, we see that 


VTE _433 
; . 


That is, 
v = VAR = 2.330 — pw 


Consequently, among a set of investments all of whose gains are normally distributed, 
the investment having the smallest VAR is the one having the largest value of 
be — 2.330. | 


5.4.1 The Normal Approximation to the Binomial Distribution 


An important result in probability theory known as the DeMoivre—Laplace limit 
theorem states that when n is large, a binomial random variable with parameters n 
and p will have approximately the same distribution as a normal random variable 
with the same mean and variance as the binomial. This result was proved originally 
for the special case of p = 5 by DeMoivre in 1733 and was then extended to gen- 
eral p by Laplace in 1812. It formally states that if we “standardize” the binomial by 
first subtracting its mean np and then dividing the result by its standard deviation 
Vnp(1 — p), then the distribution function of this standardized random variable 
(which has mean 0 and variance 1) will converge to the standard normal distribution 
function as noo. 


The DeMoivre-Laplace limit theorem 


If S,, denotes the number of successes that occur when n independent trials, each 
resulting in a success with probability p, are performed, then, for anya < b, 


Sn — np 
Pras = b} —®(b) @(a) 
Vnpl = p) 


as Nc. 


Because the preceding theorem is only a special case of the central limit theo- 
rem, which is presented in Chapter 8, we shall not present a proof. 

Note that we now have two possible approximations to binomial probabilities: 
the Poisson approximation, which is good when n is large and p is small, and the 
normal approximation, which can be shown to be quite good when np(1 — p) is 
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Example 
4g 


Example 
4h 
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Figure 5.6 The probability mass function of a binomial (n, p) random variable becomes 
more and more “normal” as n becomes larger and larger. 


large. (See Figure 5.6.) [The normal approximation will, in general, be quite good 
for values of n satisfying np(1 — p) = 10.] 


Let X be the number of times that a fair coin that is flipped 40 times lands on heads. 
Find the probability that XY = 20. Use the normal approximation and then compare 
it with the exact solution. 


Solution To employ the normal approximation, note that because the binomial is 
a discrete integer-valued random variable, whereas the normal is a continuous ran- 
dom variable, it is best to write P{XY = i} as P{i — 1/2 < X < i+ 1/2} before 
applying the normal approximation (this is called the continuity correction). Doing 
SO gives 


P{X = 20} = P{19.5 < X < 20.5} 


Pp 19.5 — 20 X — 20 20.5 — 20 
= < = 
V10 V10 V10 


P| CT el = al 


2 


10 
= ©(.16) — ®(—.16) © 1272 
The exact result is 
40 1\40 
P{LX = 20} = a) (;) = 1254 | 


The ideal size of a first-year class at a particular college is 150 students. The college, 
knowing from past experience that, on the average, only 30 percent of those accepted 
for admission will actually attend, uses a policy of approving the applications of 450 


Example 
4i 


Example 
4i 
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students. Compute the probability that more than 150 first-year students attend this 
college. 


Solution If X denotes the number of students who attend, then X is a binomial ran- 
dom variable with parameters n = 450 and p = .3. Using the continuity correction, 
we see that the normal approximation yields 


X — (450)(.3) _ 150.5 — (450)(.3) 
450(.3)(.7) /450(.3)C7) 

1 — (1.59) 

= 0559 


P(X = 150.5} = P 


2 


Hence, less than 6 percent of the time do more than 150 of the first 450 accepted 
actually attend. (What independence assumptions have we made?) a 


To determine the effectiveness of a certain diet in reducing the amount of cholesterol 
in the bloodstream, 100 people are put on the diet. After they have been on the diet 
for a sufficient length of time, their cholesterol count will be taken. The nutritionist 
running this experiment has decided to endorse the diet if at least 65 percent of the 
people have a lower cholesterol count after going on the diet. What is the proba- 
bility that the nutritionist endorses the new diet if, in fact, it has no effect on the 
cholesterol level? 


Solution Let us assume that if the diet has no effect on the cholesterol count, then, 
strictly by chance, each person’s count will be lower than it was before the diet with 
probability 7 Hence, if X is the number of people whose count is lowered, then the 
probability that the nutritionist will endorse the diet when it actually has no effect 
on the cholesterol count is 


100 100 
3 ("°) (;) = P{X = 64.5} 
i=65 


X — 00)(5) 


= 2.9 
\/ 100(5) (3) 
21 —-— (2.9) 
=~ .0019 |_| 


Fifty-two percent of the residents of New York City are in favor of outlawing cigarette 
smoking on university campuses. Approximate the probability that more than 50 
percent of a random sample of n people from New York are in favor of this prohibi- 
tion when 


(a) n=11 
(b) n=101 
(c) n= 1001 


How large would n have to be to make this probability exceed .95? 


Solution Let N denote the number of residents of New York City. To answer the 
preceding question, we must first understand that a random sample of size n is a 


sample such that the n people were chosen in such a manner that each of the 
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su 
Sn 
is 


number of white balls obtained when n balls are chosen from an urn of N balls, of 


bsets of m people had the same chance of being the chosen subset. Consequently, 
, the number of people in the sample who are in favor of the smoking prohibition, 
a hypergeometric random variable. That is, S, has the same distribution as the 


which .52N are white. But because N and .52N are both large in comparison with the 
sample size n, it follows from the binomial approximation to the hypergeometric (see 
Section 4.8.3) that the distribution of S,, is closely approximated by a binomial dis- 
tribution with parameters m and p = .52. The normal approximation to the binomial 
distribution then shows that 


— 52 5n — 52 
P{S, > .5n} =| Sn ’ Z id | 


Vn(s2)(48)  Jn(d2)(48) 
=| Ses owvii 


/n(.52)(.48) 
= 0(.04./n) 
Thus, 
®(.1328) = 5528, ifn=11 
P{S, > Sn} © 4 ©(.4020) = .6562, ifn =101 
(1.2665) = .8973, ifn = 1001 
In order for this probability to be at least .95, we would need ®(.04,/n) > .95. 


Because (x) is an increasing function and ®(1.645) = .95, this means that 


or 


04./n > 1.645 


n = 1691.266 


That is, the sample size would have to be at least 1692. 


Historical notes concerning the normal distribution 


The normal distribution was introduced by the French mathematician Abra- 
ham DeMoivre in 1733. DeMoivre, who used this distribution to approximate 
probabilities connected with coin tossing, called it the exponential bell-shaped 
curve. Its usefulness, however, became truly apparent only in 1809, when the 
famous German mathematician Karl Friedrich Gauss used it as an integral part 
of his approach to predicting the location of astronomical entities. As a result, it 
became common after this time to call it the Gaussian distribution. 


During the mid- to late 19th century, however, most statisticians started to 
believe that the majority of data sets would have histograms conforming to the 
Gaussian bell-shaped form. Indeed, it came to be accepted that it was “normal” 
for any well-behaved data set to follow this curve. As a result, following the lead 
of the British statistician Karl Pearson, people began referring to the Gaussian 
curve by calling it simply the normal curve. (A partial explanation as to why 
so many data sets conform to the normal curve is provided by the central limit 
theorem, which is presented in Chapter 8.) 
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Abraham DeMoivre (1667-1754) 


Today there is no shortage of statistical consultants, many of whom ply their 
trade in the most elegant of settings. However, the first of their breed worked, 
in the early years of the 18th century, out of a dark, grubby betting shop in 
Long Acres, London, known as Slaughter’s Coffee House. He was Abraham 
DeMoivre, a Protestant refugee from Catholic France, and, for a price, he would 
compute the probability of gambling bets in all types of games of chance. 

Although DeMoivre, the discoverer of the normal curve, made his living at 
the coffee shop, he was a mathematician of recognized abilities. Indeed, he was 
a member of the Royal Society and was reported to be an intimate of Isaac 
Newton. 

Listen to Karl Pearson imagining DeMoivre at work at Slaughter’s Coffee 
House: “I picture DeMoivre working at a dirty table in the coffee house with a 
broken-down gambler beside him and Isaac Newton walking through the crowd 
to his corner to fetch out his friend. It would make a great picture for an inspired 
artist.” 


Karl Friedrich Gauss 


Karl Friedrich Gauss (1777-1855), one of the earliest users of the normal curve, 
was one of the greatest mathematicians of all time. Listen to the words of the 
well-known mathematical historian E. T. Bell, as expressed in his 1954 book 
Men of Mathematics: In a chapter entitled “The Prince of Mathematicians,” he 
writes, “Archimedes, Newton, and Gauss; these three are in a class by themselves 
among the great mathematicians, and it is not for ordinary mortals to attempt to 
rank them in order of merit. All three started tidal waves in both pure and applied 
mathematics. Archimedes esteemed his pure mathematics more highly than its 
applications; 

Newton appears to have found the chief justification for his mathematical inven- 
tions in the scientific uses to which he put them; while Gauss declared it was all 
one to him whether he worked on the pure or on the applied side.” 


5.5 Exponential Random Variables 


A continuous random variable whose probability density function is given, for some 
A > 0, by 
Me skye =O 


r= fj ifx <0 


is said to be an exponential random variable (or, more simply, is said to be exponen- 
tially distributed) with parameter 4. The cumulative distribution function F(a) of an 
exponential random variable is given by 


F(a) = P{X = a} 


a 
= i he * de 
0 


—,x|4 
—e 0 


=1-—e*" a=0 
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Note that F(co) = hh re~** dx = 1, as, of course, it must. The parameter A will now 
be shown to equal the reciprocal of the expected value. 


Let X be an exponential random variable with parameter A. Calculate (a) ELX] and 
(b) Var(X). 


Solution (a) Since the density function is given by 


ae’* x =0 


ro={j x <0 


we obtain, forn > 0, so 
E[X"] = i x"he* dx 
0 


Integrating by parts (with Ae~** = dv and u = x") yields 
co 
E[X"] = —x"e "| + / e nx"! dx 
0 


n co 
=0+ f ne ** "1 ax 
A Jo 


n 
— —E[x"-! 
7 LX" 
Letting n = 1 and then n = 2 gives 
1 
E[X] = = 
[x] => 
2 2 
E[X*] = -E[X] = 3 
[x?] = SEIX]= 5 


(b) Hence, 


2 ihe 7 
Var(X) = = 
“) Ne (x) 2 
Thus, the mean of the exponential is the reciprocal of its parameter 4, and the vari- 
ance is the mean squared. a 


In practice, the exponential distribution often arises as the distribution of the 
amount of time until some specific event occurs. For instance, the amount of time 
(starting from now) until an earthquake occurs, or until a new war breaks out, or 
until a telephone call you receive turns out to be a wrong number are all random 
variables that tend in practice to have exponential distributions. (For a theoretical 
explanation of this phenomenon, see Section 4.7.) 


Suppose that the length of a phone call in minutes is an exponential random variable 
with parameter 4 = a If someone arrives immediately ahead of you at a public 
telephone booth, find the probability that you will have to wait 


(a) more than 10 minutes; 
(b) between 10 and 20 minutes. 


Solution Let X denote the length of the call made by the person in the booth. Then 
the desired probabilities are 
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P{X > 10}=1 — F(10) 
—e ! & 368 


(b) 
P10 < X < 20} = F(20) — F(10) 
ST een ee a 


We say that a nonnegative random variable X is memoryless if 
PUX >s+t|X >t =P{X > s} foralls,t = 0 (5.1) 


If we think of X as being the lifetime of some instrument, Equation (5.1) states that 
the probability that the instrument survives for at least s+ ¢ hours, given that it has 
survived ¢ hours, is the same as the initial probability that it survives for at least 
s hours. In other words, if the instrument is alive at age ¢, the distribution of the 
remaining amount of time that it survives is the same as the original lifetime distri- 
bution. (That is, it is as if the instrument does not “remember” that it has already 
been in use for a time ¢.) 
Equation (5.1) is equivalent to 


PX >s+t,X > th 


PX >a See ee 


or 
PIX > s+ th=P{X > s}P{X > ft} (5.2) 


Since Equation (5.2) is satisfied when X is exponentially distributed (for e* 6+ = 
ee“), it follows that exponentially distributed random variables are memoryless. 


Consider a post office that is staffed by two clerks. Suppose that when Mr. Smith 
enters the system, he discovers that Ms. Jones is being served by one of the clerks 
and Mr. Brown by the other. Suppose also that Mr. Smith is told that his service will 
begin as soon as either Ms. Jones or Mr. Brown leaves. If the amount of time that 
a clerk spends with a customer is exponentially distributed with parameter A, what 
is the probability that of the three customers, Mr. Smith is the last to leave the post 
office? 


Solution The answer is obtained by reasoning as follows: Consider the time at which 
Mr. Smith first finds a free clerk. At this point, either Ms. Jones or Mr. Brown would 
have just left, and the other one would still be in service. However, because the 
exponential is memoryless, it follows that the additional amount of time that this 
other person (either Ms. Jones or Mr. Brown) would still have to spend in the post 
office is exponentially distributed with parameter 4. That is, it is the same as if service 
for that person were just starting at this point. Hence, by symmetry, the probability 
that the remaining person finishes before Smith leaves must equal 5. a 


It turns out that not only is the exponential distribution memoryless, but it is 
also the unique distribution possessing this property. To see this, suppose that X is 
memoryless and let F(x) = P{X > x}. Then, by Equation (5.2), 


F(s + t) = F(s)F(t) 
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That is, F(-) satisfies the functional equation 
g(s + t) = g(s)g(t) 


However, it turns out that the only right continuous solution of this functional 
equation is' 
g(x) =e (5.3) 


and, since a distribution function is always right continuous, we must have 
F(x)=e* or F(x) =P{X = x}=1-e% 
which shows that X is exponentially distributed. 


Suppose that the number of miles that a car can run before its battery wears out is 
exponentially distributed with an average value of 10,000 miles. If a person desires 
to take a 5000-mile trip, what is the probability that he or she will be able to com- 
plete the trip without having to replace the car battery? What can be said when the 
distribution is not exponential? 


Solution It follows by the memoryless property of the exponential distribution that 
the remaining lifetime (in thousands of miles) of the battery is exponential with 
parameter 1 = a Hence, the desired probability is 


P{remaining lifetime > 5} = 1 — F(5) = e 4 = et? = 607 


However, if the lifetime distribution F is not exponential, then the relevant proba- 
bility is 
l= £¢ +3) 


Pf{lifetime > ¢t + 5l|lifetime > f} = 
1 — F(t) 


where ¢ is the number of miles that the battery had been in use prior to the start of 
the trip. Therefore, if the distribution is not exponential, additional information is 
needed (namely, the value of t) before the desired probability can be calculated. 


A variation of the exponential distribution is the distribution of a random vari- 
able that is equally likely to be either positive or negative and whose absolute value 
is exponentially distributed with parameter i, ) = 0. Such a random variable is said 
to have a Laplace distribution,’ and its density is given by 


1 . 
f@®= she —-o<x<ow 


FOne can prove Equation (5.3) as follows: If g(s + t) = g(s)g(t), then 


e(A)=«(F + a)=°(3) 


and repeating this yields g(m/n) = g’"(1/n). Also, 


1 1 1 1 1 
sa =e(- t F fe sine of ) g” ( ) or «(;) = (g(1))1/" 
n n n 


Hence, g(m/n) = (g(1))’""", which, since g is right continuous, implies that g(x) = (g(1))*. Because g(1) = 
b) 
(e (3)) = 0, we obtain g(x) = e~**, where 2 = — log(g(1)). 


Fit also is sometimes called the double exponential random variable. 
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Its distribution function is given by 


pt, 
>/ ae dy x <0 
2 J-co 
Fx) = 1 0 1 x 
>| ne dy + 5 f ie dy x>0 
2 co 2 0 


Consider again Example 4e, which supposes that a binary message is to be transmit- 
ted from A to B, with the value 2 being sent when the message is 1 and —2 when it 
is 0. However, suppose now that rather than being a standard normal random vari- 
able, the channel noise N is a Laplacian random variable with parameter A = 1. 
Suppose again that if R is the value received at location B, then the message is 
decoded as follows: 


If R = .5, then 1 is concluded. 
IfR < .5, then 0 is concluded. 


In this case, where the noise is Laplacian with parameter A = 1, the two types of 
errors will have probabilities given by 


P{error|message 1 is sent} = P{N < —1.5} 


= 1116 
P{error|message 0 is sent} = P{N = 2.5} 


~ 041 


On comparing this with the results of Example 4e, we see that the error probabilities 
are higher when the noise is Laplacian with 2 = 1 than when it is a standard normal 
variable. 


5.5.1 Hazard Rate Functions 

Consider a positive continuous random variable X that we interpret as being the 
lifetime of some item. Let X have distribution function F and density f. The hazard 
rate (sometimes called the failure rate) function A(t) of F is defined by 


AO = . 


, whereF=1—-—F 
F(t) 


To interpret X(t), suppose that the item has survived for a time ¢ and we desire the 
probability that it will not survive for an additional time dt. That is, consider P{X € 
(t,t + dt)|X > t}. Now, 
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PIX e@t+ dd,X >a 
P{X > t} 
P{X € (t,t + dt} 
P{X > th 
& fO a 
F(Z) 


P(X e(t,t + d)|X > h= 


Thus, A(f) represents the conditional probability intensity that a ¢-unit-old item will 
fail. 

Suppose now that the lifetime distribution is exponential. Then, by the memory- 
less property, it follows that the distribution of remaining life for a t-year-old item is 
the same as that for a new item. Hence, A(t) should be constant. In fact, this checks 
out, since 


fO 
F(t) 
rer! 
gat 
=) 


AO = 


Thus, the failure rate function for the exponential distribution is constant. The param- 
eter A is often referred to as the rate of the distribution. 

It turns out that the failure rate function A(s),s = 0, uniquely determines the 
distribution function F. To prove this, we integrate 4(s) from 0 to ¢ to obtain 


t t 
/ A(s) ds = _f9) as 
0 g 1 =F) 


= —log(1 — F(s))|p 
=-—log( — F()) + logd — F()) 
= —log(1 — F() 
where the second equality used that f(s) = oF (s) and the final equality used that 


F(O) = 0. Solving the preceding equation for F(t) gives 


t 
F(t) =1 —- exo | A(s) «| (5.4) 
0 


Hence, a distribution function of a positive continuous random variable can be 
specified by giving its hazard rate function. For instance, if a random variable has a 
linear hazard rate function — that is, if 


A) =a + bt 
then its distribution function is given by 
F(t) =1- en at—bt/2 


and differentiation yields its density, namely, 
f@) =a + bie HOP) ¢>0 


When a = 0, the preceding equation is known as the Rayleigh density function. 
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One often hears that the death rate of a person who smokes is, at each age, twice that 
of anonsmoker. What does this mean? Does it mean that a nonsmoker has twice the 
probability of surviving a given number of years as does a smoker of the same age? 


Solution If 2;(¢) denotes the hazard rate of a smoker of age ¢ and i,,(f) that of a 
nonsmoker of age t, then the statement at issue is equivalent to the statement that 


As(t) = 2An() 
The probability that an A-year-old nonsmoker will survive until age B, A < B, is 


P{A-year-old nonsmoker reaches age B} 
= P{nonsmoker’s lifetime > B|nonsmoker’s lifetime > A} 
=. 1 = Fron(B) 
> 1 ~ Fon (A) 


B 
exp {- f An(t) | 
0 


= from (5.4) 


A 
exo | snot} 
0 
B 
ef sina} 
A 


whereas the corresponding probability for a smoker is, by the same reasoning, 


B 
P{A-year-old smoker reaches age B} = exp |- / As(t) i 
A 


B 
exp {-2 An(t) i 
A 
2 
B 
= ex f snot} 
A 


In other words, for two people of the same age, one of whom is a smoker and 
the other a nonsmoker, the probability that the smoker survives to any given age 
is the square (not one-half) of the corresponding probability for a nonsmoker. For 
instance, if A,,(f) = 0 50 = t = 60, then the probability that a 50-year-old nonsmoker 
reaches age 60 is e~!/> = .7165, whereas the corresponding probability for a smoker 
is e~7/3 = 5134. a 


Equation (5.4) can be used to show that only exponential random variables are 
memoryless. For if a random variable has a memoryless distribution then the remain- 
ing life of a s year old must be the same for all s. That is, if X is memoryless, then 
A(s) = c. But, by Equation (5.4), this implies that the distribution function of X is 
F(t) =1 — e“, showing that X is exponential with rate c. 
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5.6 Other Continuous Distributions 
5.6.1 The Gamma Distribution 


A random variable is said to have a gamma distribution with parameters (a, 4), 4 > 0, 
a > 0, if its density function is given by 
ne **#Qxje—t 
f@= l(a) 
0 


x=0 


where I'(a), called the gamma function, is defined as 
CO 
T(a) = / eYytl dy 
0 

Integration of (a) by parts yields 

ioe) [o) 

hj , + / e%@ — Dy*? dy 
0 


=(a — »f ery ay (6.1) 
0 
=(a — 1)T(a — 1) 
For integral values of a, say, a = n, we obtain, by applying Equation (6.1) repeatedly, 


Tin) =(n — 1) — 1) 
=(n — 1)(n — 2)T(n — 2) 


=(n — 1)(n — 2)---3 - 2F (1) 
Since [(1) = cs e* dx = 1, it follows that, for integral values of n, 
T(n) =(n — 1)! 


When «@ is a positive integer, say, a = n, the gamma distribution with parameters 
(a, A) often arises, in practice as the distribution of the amount of time one has to 
wait until a total of m events has occurred. More specifically, if events are occurring 
randomly and in accordance with the three axioms of Section 4.7, then it turns out 
that the amount of time one has to wait until a total of n events has occurred will 
be a gamma random variable with parameters (n, i). To prove this, let 7,, denote the 
time at which the nth event occurs, and note that T;, is less than or equal to tif and 
only if the number of events that have occurred by time f¢ is at least n. That is, with 
N(t) equal to the number of events in [0, ¢], 


P{T, = tf = P{N(D = n} 


=O PINO =)} 
j=a 


_ 3 ean) 


i! 
jan 


Example 
6a 


A First Course in Probability 231 


where the final identity follows because the number of events in [0, ¢] has a 
Poisson distribution with parameter At. Differentiation of the preceding now yields 
the density function of T;,: 


OO 7—At: j—1 oo —At j 
e “ity dr re“ (AD)! 
fo=)— 5 
jan jan 
heat! — ne (Aty 
, qj — 1)! : j! 
j=n j=n 
_ net ane! 
~ (n= 1)! 


Hence, 7, has the gamma distribution with parameters (n, A). (This distribution is 
often referred to in the literature as the n-Erlang distribution.) Note that when n = 1, 
this distribution reduces to the exponential distribution. 

The gamma distribution with 7 = 5 and a = n/2,n a positive integer, is called 
the x? (read “chi-squared”) distribution with n degrees of freedom. The chi-squared 
distribution often arises in practice as the distribution of the error involved in 
attempting to hit a target in n-dimensional space when each coordinate error is nor- 
mally distributed. This distribution will be studied in Chapter 6, where its relation to 
the normal distribution is detailed. 


Let X be a gamma random variable with parameters w and A. Calculate (a) ELX] 
and (b) Var(X). 


Solution (a) 


ee a 
FX]=—— | Axe“ (Ax) * dx 


= u i he * Oxy" de 
AT (a) Jo 
Tia + 1) 


AT (a) 
by Equation (6.1) 


(b) By first calculating ELX*], we can show that 
a 
Var(X) = 2 


The details are left as an exercise. |_| 


5.6.2 The Weibull Distribution 


The Weibull distribution is widely used in engineering practice due to its versatil- 
ity. It was originally proposed for the interpretation of fatigue data, but now its use 
has been extended to many other engineering problems. In particular, it is widely 
used in the field of life phenomena as the distribution of the lifetime of some object, 
especially when the “weakest link” model is appropriate for the object. That is, con- 
sider an object consisting of many parts, and suppose that the object experiences 
death (failure) when any of its parts fails. It has been shown (both theoretically 
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and empirically) that under these conditions, a Weibull distribution provides a close 
approximation to the distribution of the lifetime of the item. 
The Weibull distribution function has the form 


0 xv 


— 1 e| Gy x>v (6.2) 


a 


A random variable whose cumulative distribution function is given by Equation (6.2) 
is said to be a Weibull random variable with parameters v,a, and 6. Differentiation 
yields the density: 


fx)= 1. | c=) 
— exp } — x > ov 
a a a 
5.6.3. The Cauchy Distribution 
A random variable is said to have a Cauchy distribution with parameter 0, 
—oo < 6 < oo, ifits density is given by 
1 


fos 
One Ide fe Oe woz X¥ <x we 


Suppose that a narrow-beam flashlight is spun around its center, which is located a 
unit distance from the x-axis. (See Figure 5.7.) Consider the point X at which the 
beam intersects the x-axis when the flashlight has stopped spinning. (If the beam is 
not pointing toward the x-axis, repeat the experiment.) 


X-axis 


Figure 5.7 


As indicated in Figure 5.7, the point _X is determined by the angle 6 between the 
flashlight and the y-axis, which, from the physical situation, appears to be uniformly 
distributed between —z/2 and z/2. The distribution function of X is thus given by 


F(x) = P{X = x} 
= P{tand = x} 
= P{o < tan! x} 


d + u tan7! 
a _ x 
2 4 
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where the last equality follows since 6, being uniform over (—7/2,7/2), has 
distribution 


Pe <qj=4$— CO) 7 ys S<a<s 
u a 


Hence, the density function of X is given by 


1 
fA OY ras oo <x < 0 
and we see that X has the Cauchy distribution.’ a 


5.6.4. The Beta Distribution 


A random variable is said to have a beta distribution if its density is given by 


CU aay Vax 2 


f(x) = Bia, b) 
0 otherwise 


where 1 
Bia, b) = xd = xo ds 
0 


The beta distribution can be used to model a random phenomenon whose set 
of possible values is some finite interval [c, d] which, by letting c denote the origin 
and taking d — casa unit measurement, can be transformed into the interval [0, 1]. 

When a = D, the beta density is symmetric about }, giving more and more 
weight to regions about 5 as the common value a increases. When a = b = 1, the 
beta distribution reduces to the uniform (0,1) distribution. (See Figure 5.8.) When 
b > a, the density is skewed to the left (in the sense that smaller values become more 
likely), and it is skewed to the right when a > b. (See Figure 5.9.) 

The relationship 

r@) 


can be shown to exist between 
1 
Bia, b) = / 0 = xP he 
0 


and the gamma function. 
Using Equation (6.3) along with the identity P(x + 1) = xI'(x), which was given 
in Equation (6.1) it follows that 
Ba+1,b) VTa@+rb) Ta@t+b)_ a 
Baa,b)  Ta+b+1) F@rb) a+b 


tT That £ (tan! y=1/dt+ x”) can be seen as follows: Ify= tan7! x, then tany = x, so 


: 2 =. 9) 
i= d (any) = d (en _ d /(siny\ dy _ [cos’y + sin’ y \ dy 
dx” dy dx dy \ cosy) dx cos? y dx 


or 
dy cos? y 1 1 


dx sin? y + cos? y “tary +1) xe +1 
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fx) 
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Figure 5.8 Beta densities with parameters (a, b) when a = b. 


f(x) 
A 


> X 


Figure 5.9 Beta densities with parameters (a, b) when a/(a + b) = 1/20. 


The preceding enables us to easily derive the mean and variance of a beta random 
variable with parameters a and b. For if X is such a random variable, then 


1 : a b-1 
E[X] => Bca,b) [ Xx ( = x) dx 
Bia + 1,5) 


Bia, b) 
a 


a+b 


Similarly, it follows that 
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1 
E[X?] = a b ; Od = a ae 

Bia + 2,b) 

- Bia,b) 
Bia + 2,b) Bia + 1,b) 

~ Baa + 1,b) B(a,b) 

_ (a + l)a 

~ @+b+)a@+ d) 


The identity Var(X¥) = ELX] — (E[X])* now yields 


_ a(a + 1) ne) 
ces ae re eee 
ab 


~ @t+ beat b+) 


Remark A verification of Equation (6.3) appears in Example 7c of Chapter 6. Ml 


5.6.5 The Pareto Distribution 


If X is an exponential random variable with rate A anda > 0, then 
Y =ae* 


is said to be a Pareto random variable with parameters a and A. The parameter 
A > Ois called the index parameter, and a is called the minimum parameter (because 
P{Y > a} =1). The distribution function of Y is derived as follows: For y = a, 


P(Y > y) = P(ae* > y) 
= P(e* > y/a) 
= P(X > log(y/a)) 
= e* logiy/a) 


= e~ log((y/a)*) 
= (a/y) 


Hence, the distribution function of Y is 
Fyy)=1-PY¥>y=1-a@y"*, y2a 
Differentiating the distribution function yields the density function of Y: 
fry) =Aa'y OY, yea 


When d = 1 it is easily checked that E[Y] = co. Whena > 1, 
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y 
=a’ (oe) 

Tole 

1-2 

a 
ha* 

7-1 
_ ha 
oe 


E[Y?] will be finite only when 4 > 2. In this case, 


EY) = | ray! dy 
a 


2-2 
y 
= ra* ns (°°) 
2—- ila 
_ haz 
=i = 
Hence, when A > 2 
ha a2 had 


var(Y) = 53 - Gob? AD DAW PD 


Remarks (a) We could also have derived the moments of Y by using the represen- 
tation Y = ae*, where X is exponential with rate 2. This yields, for A > n, 


E[Y"] = a"E[e"*] =a" i 


[o-e) ha” 
e™ re" **dx = a" ii Re A—-MX dy = 
0 


0 A-—n 


(b) Where the density function f(y) of the Pareto is positive (that is, when y > a) 
it is a constant times a power of y, and for this reason it is called a power law density. 

(c) The Pareto distribution has been found to be useful in applications relating 
to such things as 


(i) the income or wealth of members of a population; 

(ii) the file size of internet traffic (under the TCP protocol); 
(iii) the time to compete a job assigned to a supercomputer; 
(iv) the size of a meteorite; 

(v) the yearly maximum one day rainfalls in different regions. 


Further properties of the Pareto distribution will be developed in later chapters. 


5.7. The Distribution of a Function of a Random Variable 


Often, we know the probability distribution of a random variable and are interested 
in determining the distribution of some function of it. For instance, suppose that we 
know the distribution of X and want to find the distribution of g(X). To do so, it is 
necessary to express the event that g(X) = y in terms of X being in some set. We 
illustrate with the following examples. 


Example 
Ta 


Example 
7b 


Example 
Tc 


A First Course in Probability 237 


Let X be uniformly distributed over (0, 1). We obtain the distribution of the random 
variable Y, defined by Y = X”, as follows: For 0 = y = 1, 
Fy(y) = P{Y = y) 
= P(X" < y} 
=xag™} 


= Fy(y'/") 
= yt 


For instance, the density function of Y is given by 


1 1 
=A inal O< <1 
fy) = n> : | 


0 otherwise 


If X is a continuous random variable with probability density fy, then the distribu- 
tion of Y = X? is obtained as follows: For y = 0, 


Fy(y) = P{Y <y) 
= P{X? < y} 
= P{-/y = X = Jy} 
= Fx(./y) — Fx(-JY) 
Differentiation yields 
fro= 5 Flew + fx(—J/Y)] a 


If X has a probability density fy, then Y = [|X| has a density function that is obtained 
as follows: For y = 0, 


Fy(y) = P{Y = y} 
= P{|X| = y} 
=P{-ysX <y} 
= Fy(y) — Fx(-y) 


Hence, on differentiation, we obtain 


fro) =fky) + fey) y=0 | 


The method employed in Examples 7a through 7c can be used to prove Theo- 
rem 71. 


Theorem 
| 


Let X be a continuous random variable having probability density function fy. 
Suppose that g(x) is a strictly monotonic (increasing or decreasing), differen- 
tiable (and thus continuous) function of x. Then the random variable Y defined 
by Y = g(X) has a probability density function given by 


if y = g(x) for some x 
if y € g(x) for all x 


fy) = i 1y)] Jee (y) 


where g~!(y) is defined to equal that value of x such that g(x) = 
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Example 
7d 


Example 
Te 


We shall prove Theorem 71 when g(x) is an increasing function. 
Proof Suppose that y = g(x) for some x. Then, with Y = g(X), 


Fy(y) = P{g(X) = y} 
= P(X <= g1(y)} 


= Fx(g'(y)) 
Differentiation gives 
fry) = fee E80) UI 
which agrees with Theorem 71, since g~!(y) is nondecreasing, so its derivative is 


nonnegative. 
When y # g(x) for any x, then Fy(y) is either 0 or 1, and in either case fy(y) = 0. 


Let X be a continuous nonnegative random variable with density function f, and let 
Y = X". Find fy, the probability density function of Y. 
Solution If g(x) = x", then 
—1 _ ,,1/n 
. y= 


and ; : 
als = yl 
y n 


Hence, from Theorem 71, we obtain, for y= 0, 


1 
fy) = yl eg) 


n 
For 1 = 2, this gives 


1 
fy) = Zig WY) 


which (since X = 0) is in agreement with the result of Example 7b. a 


The Lognormal Distribution If X is a normal random variable with mean yu and vari- 
ance o, then the random variable 


yor 


is said to be a lognormal random variable with parameters jz and o”. Thus, a random 
variable Y is lognormal if log(Y) is anormal random variable. The lognormal is often 
used as the distribution of the ratio of the price of a security at the end of one day 
to its price at the end of the prior day. That is, if S, is the price of some security at 
the end of day n, then it is often supposed that Su is a lognormal random variable, 


Sn-1 
a -) is normal. Thus, to assume that st is lognormal is to 
n— n— 


implying that X = log ( 
assume that 
Sn = Sy—1e* 


where X is normal. 
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Let us now use Theorem 71 to derive the density of a lognormal random variable 
Y with parameters jy and 0”. Because Y = e*, where X is normal with mean pz and 


variance o2 


, we need to determine the inverse of the function g(x) = e*. Because 


y=g(¢')) =e © 


we obtain upon taking logarithms that 


d 


g '(y) = log(y) 


Using that “g~!(y) = 1/y, Theorem 71 yields the density: 


dy 


fro) = 


Summary 


A random variable X is continuous if there is a nonnega- 
tive function f, called the probability density function of X, 
such that, for any set B, 


P{X € B} = | fooax 
B 


If X is continuous, then its distribution function F will be 
differentiable and 


Fo) =f) 


The expected value of a continuous random variable X is 
defined by 


E[X] = a xf (x) dx 


A useful identity is that for any function g, 


Els] = [geo foo ax 


As in the case of a discrete random variable, the variance 
of X is defined by 


Var(X) = E[(X — E[X])’] 


A random variable X is said to be uniform over the inter- 
val (a, b) if its probability density function is given by 
1 
f@m=)b-a 
0 


as=xz=b 


otherwise 


Its expected value and variance are 


. 4a 
ea Var(X) = ey ae 


os a 12 


J2n 


exp{—(log(y) — 4)?/207}, y > 0 a 


ay 


A random variable X is said to be normal with parameters 
wand o? if its probability density function is given by 


1 2 fon? 
f~a=—e FP Pe —wo<x<o 
V210 


It can be shown that 
w= E[X] 0% = Var(X) 


If X is normal with mean p and variance o”, then Z, 


defined by 
X — be 
o 


Z= 


is normal with mean 0 and variance 1. Such a random 
variable is said to be a standard normal random vari- 
able. Probabilities about X can be expressed in terms of 
probabilities about the standard normal variable Z, whose 
probability distribution function can be obtained either 
from Table 5.1, the normal calculator on StatCrunch, or 
a website. 

When un is large, the probability distribution function 
of a binomial random variable with parameters n and p 
can be approximated by that of a normal random variable 
having mean np and variance np(1 — p). 

A random variable whose probability density function 
is of the form 


re 


ron = [¥ 


x20 
otherwise 


is said to be an exponential random variable with parame- 
ter A. Its expected value and variance are, respectively, 


1 1 


A key property possessed only by exponential random 
variables is that they are memoryless, in the sense that, for 
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positive s and f, 
PX >s4 eX > = P{X > s} 


If X represents the life of an item, then the memoryless 
property states that for any ¢, the remaining life of a t-year- 
old item has the same probability distribution as the life of 
a new item. Thus, one need not remember the age of an 
item to know its distribution of remaining life. 

Let X be a nonnegative continuous random variable 
with distribution function F and density function f. The 
function 

FO 


MO = TH 


is called the hazard rate, or failure rate, function of F. If 
we interpret X as being the life of an item, then for small 
values of dt, A(t) dt is approximately the probability that a 
t-unit-old item will fail within an additional time dt. If F is 
the exponential distribution with parameter A, then 


AH=r7A t=O 


In addition, the exponential is the unique distribution hay- 
ing a constant failure rate. 

A random variable is said to have a gamma distri- 
bution with parameters a and A if its probability density 
function is equal to 


hew**(Ax)yer] 
(a) 


fx) = x=0 


Problems 


5.1. Let X be a random variable with probability density 
function 


e(x - 3) 2<x<4 
0 otherwise 


rore| 


(a) What is the value of c? 
(b) What is the cumulative distribution function of X? 


5.2. A group of construction workers take time X (in 
hours) to finish a task. The density function of time X is 


What is the probability that the workers will take more 
than 10 hours to complete the task? 


x20 
otherwise 


5.3. For positive c, could the function 


2 
f= fi 


— 5x) O0O<x< e 
otherwise 


and is 0 otherwise. The quantity (a) is called the gamma 
function and is defined by 


l(a) = e* x2! dx 
0 


The expected value and variance of a gamma random vari- 
able are, respectively, 


E[X] = Var(X) = a 


A random variable is said to have a beta distribution 
with parameters (a, b) if its probability density function 
is equal to 


O0O<x<l1 


yal el xl 


f@= Bab) 


and is equal to 0 otherwise. The constant B(a, b) is given by 
1 
B(a,b) = / ley — aj! ae 
0 


The mean and variance of such a random variable are, 
respectively, 


a ab 
Var(X) = 
a+b ane) (a+ b2*(a+ b+ 1) 


be a probability density function? If yes, determine the 
value of c. Repeat for 


f@= as — 5x) : 
~ 10 


1l<x< 3 
otherwise 


5.4. The probability density function of X, the lifetime of 
a certain type of electronic device (measured in hours), is 
given by 


10 
fei= 2 x > 10 
0 x= 10 


(a) Find P{X > 20}. 
(b) What is the cumulative distribution function of X? 


(c) What is the probability that of 6 such types of devices, 
at least 3 will function for at least 15 hours? What assump- 
tions are you making? 


5.5. A filling station is supplied with gasoline once a week. 
If its weekly volume of sales in thousands of gallons is a 


random variable with probability density function 


O<x <1 


5(1 — x)* 
0 otherwise 


roro| 


what must the capacity of the tank be so that the prob- 
ability of the supply being exhausted in a given week 
is .01? 


5.6. Compute ELX] if X has a density function given by 


(a) f(x) pe? x >0 
a) f(x) = : 
0 otherwise 
— Jed — x) -1l<x<1l, 
)7G)= f otherwise’ 
2 5 
z+ x> 
(c) f(x) = 4 x? . 
0 x=5 


5.7. The density function of X is given by 


1sxs3 
otherwise 


3 
f= ( + bx 


If E(X) =5, find a and b. 


5.8. The wind speed, measured in miles per hour, expe- 
rienced at a particular site is a random variable having a 
probability density function given by 


3 


f(x) =3x7e x>0 


What is the expected wind velocity? 


5.9. Consider Example 4b of Chapter 4, but now suppose 
that the seasonal demand is a continuous random variable 
having probability density function f. Show that the opti- 
mal amount to stock is the value s* that satisfies 
b 
F(s*) = —— 
et b+e 

where b is net profit per unit sale, ¢ is the net loss per unit 
unsold, and F is the cumulative distribution function of the 
seasonal demand. 


5.10. Trains headed for destination A arrive at the train 
station at 15-minute intervals starting at 7 A.M., whereas 
trains headed for destination B arrive at 15-minute inter- 
vals starting at 7:05 A.M. 


(a) If a certain passenger arrives at the station at a time 
uniformly distributed between 7 and 8 A.M. and then gets 
on the first train that arrives, what proportion of time does 
he or she go to destination A? 

(b) What if the passenger arrives at a time uniformly dis- 
tributed between 7:10 and 8:10 A.M.? 
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5.11. A point is chosen at random on a line segment of 
length L. Interpret this statement, and find the probability 
that the ratio of the shorter to the longer segment is less 
than i 


5.12. A bus travels between the two cities A and B, which 
are 100 miles apart. If the bus has a breakdown, the dis- 
tance from the breakdown to city A has a uniform distri- 
bution over (0, 100). There is a bus service station in city A, 
in B, and in the center of the route between A and B. It is 
suggested that it would be more efficient to have the three 
stations located 25, 50, and 75 miles, respectively, from A. 
Do you agree? Why? 


5.13. A shuttle train completes the journey from an airport 
to a nearby city and back every 15 minutes. 


(a) If the waiting time has a uniform distribution, what is 
the probability that a passenger has to wait more than 6 
minutes for a shuttle train? 

(b) Given that a passenger has already waited for 8 min- 
utes, what is the probability that he or she has to wait an 
additional 2 minutes or more for a shuttle train? 


5.14. Let X be a uniform (0, 1) random variable. Compute 
E[X"] by using Proposition 2.1, and then check the result 
by using the definition of expectation. 


5.15. The height X, in centimeters, of adult women is nor- 
mally distributed with mean 165 centimeters and standard 
deviation 6.5 centimeters. Compute 


(a) P(X > 160}; 
(b) P{163 < X < 167}; 
(c) P{X < 164}; 
(d) P{X > 171}; 
(e) P{X < 168}. 


5.16. The annual rainfall (in inches) in a certain region is 
normally distributed with ~ = 40 and o = 4. What is the 
probability that starting with this year, it will take more 
than 10 years before a year occurs having a rainfall of more 
than 50 inches? What assumptions are you making? 


5.17. The salaries of physicians in a certain speciality are 
approximately normally distributed. If 25 percent of these 
physicians earn less than $180,000 and 25 percent earn 
more than $320,000, approximately what fraction earn 


(a) less than $200,000? 
(b) between $280,000 and $320,000? 


5.18. Suppose that X is a normally distributed random 
variable with mean pw and variance o”. If P(X < 10} = .67 
and P{X < 20} = .975, approximate jz and o. 


5.19. Let XY be an exponentially distributed random vari- 
able with mean 1/2. Find the value of c for which 
P{X > c} = .25. 
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5.20. In a city, 55 percent of the population is in favor of 
constructing a new shopping center. If a random sample of 
400 people is selected, find the probability that 


(a) at least 200 support the construction of the new shop- 
ping center; 

(b) people between 250 and 350 support the construction 
of the new shopping center; 

(c) at most 225 support the construction of the new shop- 
ping center. 


5.21. The weight of a group of people is independently 
and normally distributed, with mean 70 kg and standard 
deviation 4 kg. What percentage of individuals from this 
group weigh less than 75 kg? What percentage of individ- 
uals from this group weigh more than 67 kg? 


5.22. Tom is throwing darts at a dartboard repeatedly. 
Each of his throws, independently of all previous throws, 
has a success probability of .05 of hitting the bullseye. 
What is the approximate probability that Tom takes more 
than 50 throws to hit the bullseye? 


5.23. A card is picked at random from a shuffled card deck 
for 500 consecutive times. 


(a) What is the approximate probability that a red card will 
be picked between 250 and 300 times inclusively? 

(b) What is the approximate probability that an even- 
numbered card will be picked more than 200 times? 


5.24. The lifetimes of interactive computer chips produced 
by acertain semiconductor manufacturer are normally dis- 
tributed with parameters » = 1.4 X 10° hours and o = 
3 X 10° hours. What is the approximate probability that a 
batch of 100 chips will contain at least 20 whose lifetimes 
are less than 1.8 x 10°? 


5.25. A die is biased in such a way that even numbers are 
three times as likely to be rolled as odd numbers. Approxi- 
mate the probability that the number 5 will appear at most 
15 times in 100 throws. 


5.26. Two types of coins are produced at a factory: a fair 
coin and a biased one that comes up heads 55 percent of 
the time. We have one of these coins but do not know 
whether it is a fair coin or a biased one. In order to ascer- 
tain which type of coin we have, we shall perform the fol- 
lowing statistical test: We shall toss the coin 1000 times. If 
the coin lands on heads 525 or more times, then we shall 
conclude that it is a biased coin, whereas if it lands on 
heads fewer than 525 times, then we shall conclude that 
it is a fair coin. If the coin is actually fair, what is the prob- 
ability that we shall reach a false conclusion? What would 
it be if the coin were biased? 


5.27. In 10,000 independent tosses of a coin, the coin 
landed on heads 5800 times. Is it reasonable to assume that 
the coin is not fair? Explain. 


5.28. About 17 percent of the world’s population has blue 
eyes. What is the approximate probability of spotting at 
least 40 blue-eyed individuals in a crowd of 300 people? 
State your assumptions. 


5.29. A model for the movement of a stock supposes that 
if the present price of the stock is s, then after one period, 
it will be either us with probability p or ds with probability 
1 — p. Assuming that successive movements are indepen- 
dent, approximate the probability that the stock’s price 
will be up at least 30 percent after the next 1000 periods if 
u = 1.012,d = .990, and p = .52. 


5.30. An image is partitioned into two regions, one white 
and the other black. A reading taken from a randomly cho- 
sen point in the white section will be normally distributed 
with « = 4 and o2 = 4, whereas one taken from a ran- 
domly chosen point in the black region will have a nor- 
mally distributed reading with parameters (6, 9). A point 
is randomly chosen on the image and has a reading of 5. If 
the fraction of the image that is black is a, for what value 
of a would the probability of making an error be the same, 
regardless of whether one concluded that the point was in 
the black region or in the white region? 


5.31. (a) A fire station is to be located along a road of 
length A,A < oo. If fires occur at points uniformly cho- 
sen on (0, A), where should the station be located so as 
to minimize the expected distance from the fire? That is, 
choose a so as to 


minimize E[|X — al] 


when X is uniformly distributed over (0, A). 

(b) Now suppose that the road is of infinite length— 
stretching from point 0 outward to oo. If the distance of 
a fire from point 0 is exponentially distributed with rate 4, 
where should the fire station now be located? That is, we 
want to minimize E[|X — al], where X is now exponential 
with rate A. 


5.32. The time X (in minutes) between customer arrivals 
at a bank is exponentially distributed with mean 1.5 
minutes. 


(a) If a customer has just arrived, what is the probability 
that no customer will arrive in the next 2 minutes? 

(b) What is the probability that no customer will arrive 
within the next minute, given that no customer had arrived 
in the past minute? 


5.33. Suppose that U is a uniform random variable on 
(0,1). What is the distribution of V = aU~i fora,r7 > 0? 


5.34. Jones figures that the total number of thousands of 
miles that a racing auto can be driven before it would 
need to be junked is an exponential random variable with 
parameter 30° Smith has a used car that he claims has been 


driven only 10,000 miles. If Jones purchases the car, what 
is the probability that she would get at least 20,000 addi- 
tional miles out of it? Repeat under the assumption that 
the lifetime mileage of the car is not exponentially dis- 
tributed, but rather is (in thousands of miles) uniformly 
distributed over (0, 40). 


5.35. Suppose that XY is an exponential random variable 
with parameter i. Find the probability density function of 
Y = VX. What kind of random variable is Y? 


5.36. The hazard rate (ft) of divorce after t years of mar- 
riage is such that 


1 
ADW=— t>O0z 
= S. > 


What is the probability that a couple who have been mar- 
ried for 5 years will (a) celebrate their tenth wedding 
anniversary, and (b) celebrate their twenty-fifth wedding 
anniversary? 


5.37. Suppose that the lifetime of an electronic light bulb 
has the hazard rate function A() = .25 + .1412,t > 0. What 
is the probability that 


(a) the light bulb survives to age 4? 
(b) the light bulb’s lifetime is between 2.3 and 3.7 years? 
(c) a 2-year-old light bulb will survive to age 5? 


5.38. If X is uniformly distributed on (1, 5), find 
(a) the probability density function of Y = log(X); 
(b) P| 5 <Y< 3}. 


Theoretical Exercises 


5.1. The speed of a molecule in a uniform gas at equilib- 
rium is a random variable whose probability density func- 
tion is given by 


x <0 


foo) = al x20 
~ 10 


where b = m/2kT and k, T, and m denote, respectively, 
Boltzmann’s constant, the absolute temperature of the gas, 
and the mass of the molecule. Evaluate a in terms of b. 


5.2. Show that 
EY|= | P{Y > y}dy — i P{Y < —y}dy 
0 0 
Hint: Show that 


oo 0 
/ P{Y < -y}dy= -{ xfy (x) dx 
0 co 


[ew > ydy= f° sfyenydr 
0 0 
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5.39. If Y is uniformly distributed over (0, 5), what is the 
probability that the roots of the equation 4x* + 4xY + 
Y + 2= Oare both real? 


5.40. Suppose that X is an exponential random variable 
with parameter 4. What is the probability density function 
of Y =e**? 


5.41. Suppose that X has a beta distribution with ELX] = 
1/9 and Var(X) = 1/81. Find the parameters (a, b) corre- 
sponding to X. 


5.42. If X is uniformly distributed over (a,b),a < b, what 
is the probability density function of Y = cX + d for any 
constants c and d? 


5.43. Find the distribution of R = Asin@, where A 
is a fixed constant and @ is uniformly distributed on 
(—2/2,72/2). Such a random variable R arises in the the- 
ory of ballistics. If a projectile is fired from the origin at 
an angle a from the earth with a speed v, then the point 
R at which it returns to the earth can be expressed as 
R = (v?/g)sin2a, where g is the gravitational constant, 
equal to 980 centimeters per second squared. 


5.44. Let Y be a lognormal random variable (see Exam- 
ple 7e for its definition) and let c > 0 be a constant. 
Answer true or false to the following, and then give an 
explanation for your answer. 


(a) cY is lognormal; 
(b) c + Y is lognormal. 


5.3. Show that if X has density function f, then 


Elg(X)| = / aenp ends 


Hint: Using Theoretical Exercise 5.2, start with 

Elg(X)] =) P{g(X) > y}dy — / P{g(X) < —y}dy 
0 0 

and then proceed as in the proof given in the text when 

g(X) = 0. 

5.4. Prove Corollary 2.1. 


5.5. Use the result that for a nonnegative random vari- 
able Y, 


Ey|= f° Pw > tdt 
0 


to show that for a nonnegative random variable X, 


E[X"] = [ nx"! P{X > x}dx 
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Hint: Start with 
ex"|= [ P{X" > thdt 
0 


and make the change of variables t = x”. 
5.6. Define a collection of events E,,0 < a < 1, having 
the property that P(E.) = 1 for alla but P| () E, ) = 0. 

a 


Hint: Let X be uniform over (0, 1) and define each E, in 
terms of X. 


5.7. The standard deviation of X, denoted SDC(X), is 


given by 
SD(X) = vy Var(X) 
Find SD(aX + b) if X has variance o?. 


5.8. Let X be a random variable that takes on values 
between 0 and c. That is, P(0 = X S c} = 1. Show that 


2 


Cc 
Var(X) = — 
ar(X) A 


Hint: One approach is to first argue that 
E[X?] = cE[X] 
and then use this inequality to show that 


Var(X) < [a1 —a)] wherea = Ax) 
Cc 


5.9. Show that Z is a standard normal random variable; 
then, for x > 0, 


(a) P{Z > x} = P{Z < —x}; 
(b) P{|Z| > x} =2P{Z > x}; 
(c) P{|Z| < x} =2P{Z < x} — 1. 


5.10. Let f(x) denote the probability density function of 
a normal random variable with mean jz and variance o”. 
Show that ~ — o and uw + o are points of inflection of this 
function. That is, show that f”(x) = 0 when x = uw — o or 


X=uU+o. 


5.11. Let Z be a standard normal random variable Z, and 
let g be a differentiable function with derivative g’. 

(a) Show that = E[g'(Z)] = E[Zg(Z)]; 

(b) Show that E[Z"+!] = nE[Z"-1]. 

(c) Find E[Z*]. 

5.12. Use the identity of Theoretical Exercise 5.5 to derive 


E[X?] when X is an exponential random variable with 
parameter i. 


5.13. The median of a continuous random variable having 
distribution function F is that value m such that F(m) = 5. 


That is, a random variable is just as likely to be larger than 
its median as it is to be smaller. Find the median of X if 
X is 

(a) uniformly distributed over (a, b); 

(b) normal with parameters jy, o?: 

(c) exponential with rate . 


5.14. The mode of a continuous random variable having 
density f is the value of x for which f(x) attains its maxi- 
mum. Compute the mode of X in cases (a), (b), and (c) of 
Theoretical Exercise 5.13. 


5.15. If X is an exponential random variable with parame- 
ter A, andc > 0, show that cX is exponential with param- 
eter A/c. 


5.16. Compute the hazard rate function of X when X is 
uniformly distributed over (0, a). 


5.17. If X has hazard rate function A.y(t), compute the haz- 
ard rate function of aX where a is a positive constant. 


5.18. Verify that the gamma density function integrates 
to L. 


5.19. If X is an exponential random variable with mean 
1/4, show that 


Hint: Make use of the gamma density function to evaluate 
EX], 


5.20. Verify that 
a 
Var(X) = 2 


when X is a gamma random variable with parameters a 
and A. 


5.21. Show that I (5) = /%; 
Hint: T (5) — i ae dx. Make the change of vari- 


ables y = /2x and then relate the resulting expression to 
the normal distribution. 


5.22. Compute the hazard rate function of a gamma ran- 
dom variable with parameters (a, 4) and show it is increas- 
ing when aw = 1 and decreasing whena = 1. 


5.23. Compute the hazard rate function of a Weibull ran- 
dom variable and show it is increasing when 6 = 1 and 
decreasing when 8 = 1. 


5.24. Show that a plot of log(log(1 — F(x))~!) against log 
x will be a straight line with slope 6 when F(-) is a Weibull 
distribution function. Show also that approximately 63.2 
percent of all observations from such a distribution will be 
less than a. Assume that v = 0. 


5.25. Let 


Show that if X is a Weibull random variable with parame- 
ters v,a, and 6, then Y is an exponential random variable 
with parameter A = 1 and vice versa. 


5.26. Let F be a continuous distribution function. If U is 
uniformly distributed on (0,1), find the distribution func- 
tion of Y = F~!(U), where F~! is the inverse function of 
F. (That is, y = F~!(x) if F(y) = x.) 


5.27. If X is uniformly distributed over (a, b), what ran- 
dom variable, having a linear relation with X, is uniformly 
distributed over (0, 1)? 


5.28. Consider the beta distribution with parameters 
(a, b). Show that 


(a) when a > 1 and b > 1, the density is unimodal (that 
is, it has a unique mode) with mode equal to (a — 1)/(a + 
b — 2); 

(b) whena = 1,5 =<1,anda + b < 2, the density is either 
unimodal with mode at 0 or 1 or U-shaped with modes at 
both 0 and 1; 


(c) when a = 1 = b, all points in [0, 1] are modes. 


5.29. Let X be a continuous random variable having 
cumulative distribution function F. Define the random 
variable Y by Y = F(X). Show that Y is uniformly dis- 
tributed over (0, 1). 


5.30. Let X have probability density fy. Find the proba- 
bility density function of the random variable Y defined 
by Y=aX + b. 


5.31. Find the probability density function of Y = e* when 
X is normally distributed with parameters and o?. The 
random variable Y is said to have a lognormal distribution 
(since log Y has a normal distribution) with parameters ju 
and o?. 


Self-Test Problems and Exercises 


5.1. The number of minutes of playing time of a certain 
high school basketball player in a randomly chosen game 
is a random variable whose probability density function is 
given in the following figure: 


050 F 


025 - 


10 20 30 40 
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5.32. Let X and Y be independent random variables that 
are both equally likely to be either 1,2,..., (10), where N 
is very large. Let D denote the greatest common divisor of 
X and Y, and let O, = P{D = k}. 


(a) Give a heuristic argument that Q; = Oi: 

Hint: Note that in order for D to equal k, k must divide 
both X and Y and also X/k, and Y/k must be relatively 
prime. (That is, X/k, and Y/k must have a greatest com- 
mon divisor equal to 1.) 


(b) Use part (a) to show that 


Q; = P{X and ¥ are relatively prime} 
1 


3s 1/k? 
k=1 


It is a well-known identity that )> 1/k2 = 17/6, s0 Q} = 
1 


6/22. (In number theory, this is known as the Legendre 
theorem.) 


(c) Now argue that 


where P; is the ith-smallest prime greater than 1. 
Hint: X and Y will be relatively prime if they have no com- 
mon prime factors. Hence, from part (b), we see that 


5.33. Prove Theorem 71 when g(x) is a decreasing 
function. 


Find the probability that the player plays 


(a) more than 15 minutes; 

(b) between 20 and 35 minutes; 
(c) less than 30 minutes; 

(d) more than 36 minutes. 


5.2. For some constant c, the random variable X has the 
probability density function 


cx” O<x<1 
FO) = 0 otherwise 
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Find (a) c and (b) P{X > x},0 < x < 1. 


5.3. For some constant c, the random variable X has the 
probability density function 


cx* 0 <x <2 
[y= | 0 otherwise 


Find (a) ELX] and (b) Var(X). 


5.4. The random variable X has the probability density 
function 


0-< x = 1 
otherwise 


ax + bx? 


If E[X] = .6, find (a) P(X < 4} and (b) Var(X). 


5.5. The random variable _X is said to be a discrete uniform 


random variable on the integers 1,2,...,n if 
: 1, 
P{X =i}=- i=1,2,...,n 
n 


For any nonnegative real number x, let Int(x) (sometimes 
written as [x]) be the largest integer that is less than or 
equal to x. Show that if U is a uniform random variable on 
(0, 1), then XY = Int(nU) + 1 isa discrete uniform random 
variable on 1,...,n. 


5.6. Your company must make a sealed bid for a construc- 
tion project. If you succeed in winning the contract (by 
having the lowest bid), then you plan to pay another firm 
$100,000 to do the work. If you believe that the minimum 
bid (in thousands of dollars) of the other participating 
companies can be modeled as the value of a random vari- 
able that is uniformly distributed on (70, 140), how much 
should you bid to maximize your expected profit? 


5.7. To be a winner in a certain game, you must be success- 
ful in three successive rounds. The game depends on the 
value of U, a uniform random variable on (0, 1). If U > .1, 
then you are successful in round 1; if U > .2, then you are 
successful in round 2; and if U > .3, then you are successful 
in round 3. 


(a) Find the probability that you are successful in round 1. 
(b) Find the conditional probability that you are successful 
in round 2 given that you were successful in round 1. 

(c) Find the conditional probability that you are success- 
ful in round 3 given that you were successful in rounds 1 
and 2. 


(d) Find the probability that you are a winner. 


5.8. A randomly chosen IQ test taker obtains a score that 
is approximately a normal random variable with mean 100 
and standard deviation 15. What is the probability that the 


score of such a person is (a) more than 125; (b) between 
90 and 110? 


5.9. Suppose that the travel time from your home to your 
office is normally distributed with mean 40 minutes and 
standard deviation 7 minutes. If you want to be 95 percent 
certain that you will not be late for an office appointment 
at 1 P.M., what is the latest time that you should leave 
home? 


5.10. The life of a certain type of automobile tire is nor- 
mally distributed with mean 34,000 miles and standard 
deviation 4000 miles. 


(a) What is the probability that such a tire lasts more than 
40,000 miles? 

(b) What is the probability that it lasts between 30,000 and 
35,000 miles? 

(c) Given that it has survived 30,000 miles, what is the con- 
ditional probability that the tire survives another 10,000 
miles? 


5.11. The annual rainfall in Cleveland, Ohio, is approxi- 
mately a normal random variable with mean 40.2 inches 
and standard deviation 8.4 inches. What is the probabil- 
ity that 


(a) next year’s rainfall will exceed 44 inches? 


(b) the yearly rainfalls in exactly 3 of the next 7 years will 
exceed 44 inches? 


Assume that if A; is the event that the rainfall exceeds 44 
inches in year i (from now), then the events A;,i = 1, are 
independent. 


5.12. The following table uses 1992 data concerning the 
percentages of male and female full-time workers whose 
annual salaries fall into different ranges: 


Percentage Percentage 
Earningsrange offemales of males 
=9999 8.6 4.4 
10,000-19,999 38.0 21.1 
20,000-24,999 19.4 15.8 
25,000-49,999 29.2 41.5 
250,000 4.8 172 


Suppose that random samples of 200 male and 200 female 
full-time workers are chosen. Approximate the probabil- 
ity that 

(a) at least 70 of the women earn $25,000 or more; 

(b) at most 60 percent of the men earn $25,000 or more; 


(c) at least three-fourths of the men and at least half the 
women earn $20,000 or more. 


5.13. At a certain bank, the amount of time that a cus- 
tomer spends being served by a teller is an exponential 
random variable with mean 5 minutes. If there is a cus- 
tomer in service when you enter the bank, what is the 
probability that he or she will still be with the teller after 
an additional 4 minutes? 


5.14. Suppose that the cumulative distribution function of 
the random variable X is given by 
F(x) =1- e* x>0 


Evaluate (a) P(X > 2};(b) P{1 < X < 3}; (c) the hazard 
rate function of F; (d) ELX]; (e) Var(X). 


Hint: For parts (d) and (e), you might want to make use of 
the results of Theoretical Exercise 5.5. 


5.15. The number of years that a washing machine func- 
tions is a random variable whose hazard rate function is 
given by 


2 62722 
Mh= 1 2+ 30 =D — 
11 > 5 


(a) What is the probability that the machine will still be 
working 6 years after being purchased? 

(b) If it is still working 6 years after being purchased, what 
is the conditional probability that it will fail within the next 
2 years? 


5.16. A standard Cauchy random variable has density 
function 


1 
TO a oo < xX < 00 
Show that if X is a standard Cauchy random 


variable, then 1/X is also a standard Cauchy random vari- 
able. 


5.17. A roulette wheel has 38 slots, numbered 0, 00, and 
1 through 36. If you bet 1 on a specified number, then 
you either win 35 if the roulette ball lands on that num- 
ber or lose 1 if it does not. If you continually make such 
bets, approximate the probability that 


(a) you are winning after 34 bets; 


(b) you are winning after 1000 bets; 
(c) you are winning after 100,000 bets. 


Assume that each roll of the roulette ball is equally likely 
to land on any of the 38 numbers. 


5.18. There are two types of batteries in a bin. When in use, 
type i batteries last (in hours) an exponentially distributed 
time with rate A;,i = 1,2. A battery that is randomly cho- 
sen from the bin will be a type i battery with probability 
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Pi, : pi = 1.Ifarandomly chosen battery is still operating 


after t hours of use, what is the probability that it will still 
be operating after an additional s hours? 


5.19. Evidence concerning the guilt or innocence of a 
defendant in a criminal investigation can be summarized 
by the value of an exponential random variable X whose 
mean 4 depends on whether the defendant is guilty. If 
innocent, «4 = 1; if guilty, ~ = 2. The deciding judge will 
rule the defendant guilty if X > c for some suitably chosen 
value of c. 


(a) If the judge wants to be 95 percent certain that an inno- 
cent man will not be convicted, what should be the value 
of c? 


(b) Using the value of c found in part (a), what is the 
probability that a guilty defendant will be convicted? 


5.20. For any real number y, define y* by 


Let c be a constant. 


(a) Show that 


E[(Z — o)*]= ae 


when Z is a standard normal random variable. 


(b) Find E[(X — c)t] when_X is normal with mean pw and 
variance o”. 


c(l 


(c)) 


5.21. With ®(x) being the probability that a normal ran- 
dom variable with mean 0 and variance 1 is less than x, 
which of the following are true: 


(a) ®(—x) = P(x) 
(b) ®&) + O(-x) = 1 
(c) ®(—x) = 1/(@) 


5.22. Let U be a uniform (0,1) random variable, and let 
a < bbe constants. 


(a) Show that if b > 0, then bU is uniformly distributed 
on (0,5), andif b < 0, then bU is uniformly distributed on 
(b,0). 

(b) Show that a + U is uniformly distributed on (a, 1 + a). 
(c) What function of U is uniformly distributed on (a, b)? 
(d) Show that min(U,1 — U) is a uniform (0, 1/2) random 
variable. 


(e) Show that max(U,1 — 
variable. 


U) is a uniform (1/2, 1) random 
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5.23. Let 5.24. Let 
Lex 62 
a eee agree 6 ee" 
x= 3 1 =X < 
Zee“) ifx21 where 6 > 0. 


(a) Show that f(x) is a density function. That is, show that 
(a) Show that f is a probability density function. (That is, f(x) = 0, and that ie f(x) dx =1. 
show that f(x) = 0, and j ine 69) dx = 1.) (b) Find ELX] 
(b) If _X has density function f, find EX]. (c) Find Var(X). 
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6.1 Joint Distribution Functions 


Thus far, we have concerned ourselves only with probability distributions for single 
random variables. However, we are often interested in probability statements con- 
cerning two or more random variables. In order to deal with such probabilities, we 
define, for any two random variables X and Y, the joint cumulative probability dis- 
tribution function of X and Y by 


F(a,b)=P{X sa,Y =b} -w-<a,b<o 


All joint probability statements about X and Y can, in theory, be answered in terms 
of their joint distribution function. For instance, 


Pay < X Sa, by < Y =b2) = F(an, bz) + F(ay, 1) — F(a, b2) — F(a2,b1) (1.1) 
whenever a; < a2, by < bp. To verify Equation (1.1), note that for aj < a, 
P(X Sa,Y Sb)=PX Sa, Y = b) + Pa, < X Sam, Y Sb) 


giving that 
Pla, < X Sa, Y S b)=F(a,b) — F(ay,b) (1.2) 


Also, because for by < bo, 


Pia, < X Sa, Y Sbo) = Playa < X Sa, Y S)1) + Play < X Sa, by < Y Sb) 


249 
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we have that when a, < a,b, < bo 


Play < X Sam,b, < YSbh)=P(q < X Sm, Y Sd) 
—P(a, < X =a, Y = by) 
= F(a2,b2) — F(a,,b2) — F(az,b1) + F(a, b1) 


where the final equality used Equation (1.2). 

When X and Y are discrete random variables, with X taking on one of the values ~;, 
i = 1,and Y one of the values y;, j = 1, it is convenient to define the joint probability 
mass function of X and Y by 


P(x, y) = P(X =x,Y =y) 


Using that the event {X = x} is the union of the mutually exclusive events {XY = 
x, Y = yj}, j = 1, it follows that the probability mass function of X can be obtained 
from the joint probability mass function by 


Px(xX) = P(X = x) 
= PUULX = x, Y = y;}) 
=) Pix =2,.¥ Sy) 


j 
= > pix yi) 
J 


Similarly, the probability mass function of Y is obtained from 


Py”) = > p@y) 


Suppose that 3 balls are randomly selected from an urn containing 3 red, 4 white, and 
5 blue balls. If we let X¥ and Y denote, respectively, the number of red and white balls 
chosen, then the joint probability mass function of X and Y, p(i,j) = P{X =i, Y = j}, 
is obtained by noting that ¥ = i, Y = 7 if, of the 3 balls selected, i are red, j are white, 
and 3 — i — jare blue. Because all subsets of size 3 are equally likely to be chosen, 
it follows that 


GG) 
(3) 


PU)= 


Consequently, 


Example 
Ib 
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mo-(3)/(3)-a 
ma-()Q)/(8)-8 
mo- (IQ) -8 
mo-()()/(8)-8 
mo-(()M8)-8 
mo-(O)8)-8 


3 12 1 
mo-()/(8)- 


These probabilities can most easily be expressed in tabular form, as in Table 6.1. The 
reader should note that the probability mass function of X is obtained by computing 
the row sums, whereas the probability mass function of Y is obtained by comput- 
ing the column sums. Because the individual probability mass functions of X and Y 
thus appear in the margin of such a table, they are often referred to as the marginal 
probability mass functions of X and Y, respectively. & 


Table 6.1 P(X =i,Y =). 


i 


i 0 1 D; 3. Rowsum= P{X = i} 


0 10 40 30 4 84 
220 220 220 220 220 

1 30 60 18 0 108 
220 220 = 220 220 

15 12 27 
: 2m nm ° 9° 220 

il 1 

3 0 0 0 0 0 


Column sum = P{Y = j} 


220. 220 220 220 


Suppose that 15 percent of the families in a certain community have no children, 20 
percent have 1 child, 35 percent have 2 children, and 30 percent have 3. Suppose 
further that in each family each child is equally likely (independently) to be a boy or 
a girl. If a family is chosen at random from this community, then B, the number of 
boys, and G, the number of girls, in this family will have the joint probability mass 
function shown in Table 6.2. 
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Table 6.2 P{B =i,G =j}. 
J 
i 0 1 2 3. Rowsum = P{B = i} 
0 AS 10 = =©.0875 0375 3750 
1 10 75.1125 0 3875 
2 0875 .1125 0 0 .2000 
3 0375 0 0 0 0375 
Column sum = P{G = j} 3750) 3875. 2000 —-.0375 


The probabilities shown in Table 6.2 are obtained as follows: 
P{B =0,G = 0} = P{no children} = .15 
P{B =0,G = 1} = P{1 girl and total of 1 child} 


= P{1 child}P{1 girl|1 child} = (.20) (;) 


P{B = 0, G = 2} = P{2 girls and total of 2 children} 


1 2 
= P{2 children}P{2 girls|2 children} = (.35) (;) 


We leave the verification of the remaining probabilities in the table to the reader. 


Consider independent trials where each trial is a success with probability p. Let X, 
denote the number of trials until there have been r successes, and let Y; denote the 
number of trials until there have been s failures. Suppose we want to derive their 
joint probability mass function P(X; = i, Ys = j). To do so, first consider the case 
i < j. In this case, write 


P(X, =i, Ys =) = P(X, = P(Ys = i1X, =i) 


Now, if there have been r successes after trial i then there have been i — r failures 
by that point. Hence, the conditional distribution of Y;, given that X, = i, is the 
distribution of i plus the number of additional trials after trial 7 until there have been 
an additional s — i + r failures. Hence, 


P(X, =i, Ys =) = P(X, = )PVs-i4r =F —D, i<j 


Because X; is a negative binomial random variable with parameters (r,p) and Ys_i+,r 
is a negative binomial random variable with parameters (s — i + r,1 — p), the 
preceding yields 


4 eG a : : 
Px =i¥=)=(( = 1)" = p-*( ae = 1)a- pyre, i<j 


s-—-i+r 


We leave it as an exercise to determine the analogous expression when j < i. a 
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We say that X and Y are jointly continuous if there exists a function f(x, y), 
defined for all real x and y, having the property that for every set C of pairs of real 
numbers (that is, C is a set in the two-dimensional plane), 


PUX,Y) € Ch = / f(x, y) dx dy (1.3) 
(xy)eC 


The function f(x, y) is called the joint probability density function of X and Y. If A 
and B are any sets of real numbers, then by defining C = {(x,y) : x € A,y € B}, we 
see from Equation (1.3) that 


PU e AY €B) = [ [ foonaray (1.4) 
BJA 


Because 


F(a,b) = P{X € (—00, a], Y € (—00, b]} 


b a 
= / fy) dedy 


it follows, upon differentiation, that 
2 


) 
f(a, b) = saab) 


wherever the partial derivatives are defined. Another interpretation of the joint den- 
sity function, obtained from Equation (1.4), is 


d+db pa+da 
Pla<X <a+ dab <¥ <b + db)= f / f(x, y) dx dy 
b a 
=~ f(a,b) dadb 


when da and db are small and f(x, y) is continuous at a, b. Hence, f(a, b) is a measure 
of how likely it is that the random vector (X, Y) will be near (a, b). 

If X and Y are jointly continuous, they are individually continuous, and their 
probability density functions can be obtained as follows: 


P{X € A} = P{LX €A,Y € (—co,cw)} 


=[ [ fenaa 
A J—oo 


= / fx (x) dx 
r 


fx) =f fry) dy 


where 


is thus the probability density function of X. Similarly, the probability density func- 
tion of Y is given by 
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fron =f f(x,y) dx 


Example The joint density function of X and Y is given by 


Id 
_ p2e*e*” O<x<0,0<y<o 
Py) = fF otherwise 


Compute (a) P{X > 1,Y < 1},(b) P{X < Y}, and (c) P{X < a}. 


Solution 
1 co 
(a) P(X >1,Y <1) i / 2e-*e~* dx dy 
0 J1 
Now, 
[oe 
i e*dx =—-e" | = eg 
1 
giving that 
1 
PX >1,Y <l)= ag 2e-'dy =e !(1 — e7”) 
0 
(b) P(X < Yy= I 2e*e~*” dx dy 
(x,y)ix<y 
co ry 
= / 2e-*e~* dx dy 
0 0 
oe) 
-|/ 26°70 =e ay 
— [oe 
=) Ie Vdy = / 2e— dy 
0 0 
_ 2 
7 3 
= 1 
3 
a [oe 
(c) P{X < a= | } 2e-* e—* dy dx 
0 JO 
a 
=) e* dx 
0 
=1-e% | 


Example Consider a circle of radius R, and suppose that a point within the circle is randomly 
le chosen in such a manner that all regions within the circle of equal area are equally 
likely to contain the point. (In other words, the point is uniformly distributed within 
the circle.) If we let the center of the circle denote the origin and define X and Y 
to be the coordinates of the point chosen (Figure 6.1), then, since (X, Y) is equally 
likely to be near each point in the circle, it follows that the joint density function of 

X and Y is given by 
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y 


R 
(X,Y) 
wy | 


Figure 6.1 Joint probability distribution. 


c if? + y* = R? 


(y) = 
ey O ifx? + y* = R* 


for some value of c. 
(a) Determine c. 
(b) Find the marginal density functions of X and Y. 
(c) Compute the probability that D, the distance from the origin of the point 
selected, is less than or equal to a. 


(d) Find E [D]. 


Solution 
(a) Because 


‘a [ feenayac=t 


c // dydx =1 


x?2+y?2<R? 


it follows that 


We can evaluate 1 fa +y2<R? dydx either by using polar coordinates or, 
more simply, by noting that it represents the area of the circle and is thus equal 
to 2 R2. Hence, 
1 
~ wR2 


(b) fro) [ * ea 


1 
TR? Jx24y2<R? 
1 a 
= = | dy, wherea =v R2 — x? 
8 —a 


= a Rae 2 er 
1 
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and it equals 0 when x? > R*. By symmetry, the marginal density of Y is 
given by 


2 
fro) = eve -y, y=R 


=0 yy = R? 


(c) The distribution function of D = /X2 + Y2, the distance from the origin, is 
obtained as follows: For0 = a S R, 


Fp(a) = P{V X2 + Y? <a} 


= P(X? + Y* <a} 


= f(x,y) dy dx 
x2ty2< @ 
1 
= — d dx 
mt R2 / y 
x2+4y? < a2 
_ maz 
~  R2 
a 
~ RZ 
where we have used the fact that fe iL aigh<g dy dx is the area of a circle of 


radius a and thus is equal to za’. 


(d) From part (c), the density function of D is 


2a 
fo(@= Re O0sazR 
Hence, 
a 2R 


Example The joint density of X and Y is given by 
If 


eo) Vee <m, 0 xy <c 
Py) = | 0 otherwise 


Find the density function of the random variable X/Y. 
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Solution We start by computing the distribution function of X/Y. Fora > 0, 


xX 
Fy /y(a) = P\ s al 


— // e Y) dx dy 


x/y=a 


oo pay 
= / / e+) dx dy 
0 0 


= da — e&™)e dy 
0) 


e (at Dy ial 
— e? + 
a+1 
0 


Differentiation shows that the density function of X/Y is given by fy;y(a) = 1/ 
(a + 1)7,0 < a < ©. | 


We can also define joint probability distributions for n random variables in 
exactly the same manner as we did for = 2. For instance, the joint cumulative prob- 
ability distribution function Fay, a2,..., 4) of the n random variables X1, X2,..., Xn 
is defined by 


F(a, a2,...,4n) = PLX, = ay, X92 = O55 29 Xn = an} 


Further, the n random variables are said to be jointly continuous if there exists a 
function f(x1,x2,...,Xn), called the joint probability density function, such that, for 
any set C in n-space, 


P{(X, X2,...,Xn) € Ch = [fof fer mandir din 


(X41 5-5 )EC 
In particular, for any 1 sets of real numbers Aj, A2,..., An, 


PLX4 € Ay, X2 € Ad,...,Xn € An} 


=i / vee f(41,---5Xn) dx dx2--- dxy 
An YAn-1 Ai 
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The multinomial distribution 


One of the most important joint distributions is the multinomial distribution, which 
arises when a sequence of n independent and identical experiments is performed. 
Suppose that each experiment can result in any one of r possible outcomes, with 


Z 
respective probabilities p1,p2,...,Pr, >. pi = 1. If we let X; denote the number of 


i= 
the n experiments that result in outcome number /, then 


an _ _ = n! Ny 2 Ny 
PLX, = 1,X2 =n2,...,X, = Ny} = mini ml! Pa °° Pr (1.5) 


: 
whenever )> nj =n. 


i=1 

Equation (1.5) is verified by noting that any sequence of outcomes for the n 
experiments that leads to outcome 7 occurring n; times for i = 1,2,...,7 will, by 
the assumed independence of experiments, have probability Dy pe ... py’ of occur- 
ring. Because there are n!/(n1!n2!...n,;!) such sequences of outcomes (there are 
n\/n,!...n,! different permutations of n things of which ny, are alike, m2 are alike, 
...,My are alike), Equation (1.5) is established. The joint distribution whose joint 
probability mass function is specified by Equation (1.5) is called the multinomial 
distribution. Note that when r = 2, the multinomial reduces to the binomial dis- 
tribution. 

Note also that any sum of a fixed set of the X;s will have a binomial distribu- 
tion. That is, if N C {1,2,...,7}, then )°j<,, Xi will be a binomial random variable 
with parameters n and p = )°j;-y pi. This follows because }°;-, Xi represents the 
number of the n experiments whose outcome is in N, and each experiment will inde- 
pendently have such an outcome with probability )° jy pi- 


As an application of the multinomial distribution, suppose that a fair die is rolled 
9 times. The probability that 1 appears three times, 2 and 3 twice each, 4 and 5 once 
each, and 6 not at all is 


sonia (8) (6) (@) (6) (¢) (¢) = sam (@) 


We can also use the multinomial distribution to analyze a variation of the classical 
birthday problem which asks for the probability that no 3 people in a group of size 
n have the same birthday when the birthdays of the n people are independent and 
each birthday is equally likely to be any of the 365 days of the year. Because this 
probability is 0 when n > 730 (why is this), we will suppose that n = 730. To find 
the desired probability, note that there will be no set of 3 people having the same 
birthday if each of the 365 days of the year is the birthday of at most 2 persons. Now, 
this will be the case if for some i S n/2 the event A; occurs, where A; is the event 
that the 365 days of the year can be partitioned into three groups of respective sizes 
isn — 2i,and 365 — n + isuch that every day in the first group is the birthday of 
exactly 2 of the n individuals, every day in the second group is the birthday of exactly 
1 of the n individuals, and every day in the third group is the birthday of none of the 
n individuals. Now, because each day of the year is equally likely to be the birthday 
of an individual, it follows, for a given partition of the 365 days into three groups of 
respective sizes i,m — 2i, and 365 — n + i, that the probability each day in the first 
group is the birthday of exactly 2 of the n individuals, each day in the second group 
is the birthday of exactly 1 of the n individuals, and each day in the third group is the 
birthday of none of the 1 individuals is equal to the multinomial probability 
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n! 1 


Bid 2ioHss—nai (365° : 


As the number of partitions of the 365 days of the year into 3 groups of respective 


sizes i,n — 21,365 — n + iis NGOS? it follows that 
365! n! 1 
P(Aj) = n  j<n/2 
(A) = t@ — 21G65 — n+ DI 2305? = 


As the events Aj, i = n/2, are mutually exclusive we have that 


el 365! ni 1 
P{no set of three with same birthday} = > im ~ 2165 — nL DI 2! (365 Ne 

= ! ! 
When n = 88, the preceding gives 

44 

365! 88! 1 

P{no set of three with same birthday} = a - - - —( 88 ~ 504 

ne; i(88 — 2i)!(277 + i)! 2! °365 

8 


6.2 Independent Random Variables 


The random variables X and Y are said to be independent if, for any two sets of real 
numbers A and B, 


P(X € A,Y € B} = PLX € A}PLY € B} (2.1) 


In other words, X and Y are independent if, for all A and B, the events E4, = 
{X € A} and Fg = {Y ¢€ B} are independent. 

It can be shown by using the three axioms of probability that Equation (2.1) will 
follow if and only if, for all a, b, 


PUX <a,Y = b}=P{X < a}P{Y S bd} 


Hence, in terms of the joint distribution function F of X and Y, X and Y are inde- 
pendent if 
F(a,b) = Fx(a)Fy(b) for all a,b 


When X and Y are discrete random variables, the condition of independence (2.1) 
is equivalent to 


P(x, y) = px(x)py(y) forall x, y (2.2) 


The equivalence follows because, if Equation (2.1) is satisfied, then we obtain Equa- 
tion (2.2) by letting A and B be, respectively, the one-point sets A = {x} and B = {y}. 
Furthermore, if Equation (2.2) is valid, then for any sets A, B, 


P(X ¢A,¥e BY=)_ > ptx.y) 


yeBxeA 


=> > px@priy) 


yeBxeA 


= >i py) )o px) 


yeB xeA 
= P{Y € B}P{X € A} 
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and Equation (2.1) is established. 
In the jointly continuous case, the condition of independence is equivalent to 


fay) =fx@ fr) for all x, y 


Thus, loosely speaking, X and Y are independent if knowing the value of one 
does not change the distribution of the other. Random variables that are not inde- 
pendent are said to be dependent. 


Suppose that n + m independent trials having a common probability of success p are 
performed. If X is the number of successes in the first n trials, and Y is the number 
of successes in the final m trials, then X and Y are independent, since knowing the 
number of successes in the first 7 trials does not affect the distribution of the number 
of successes in the final m trials (by the assumption of independent trials). In fact, 
for integral x and y, 


INIA 
3S 


neanrene(s)ro—oe(S)o 00 3 
= P(X =x}P{Y =y} 


In contrast, X and Z will be dependent, where Z is the total number of successes in 
the n + m trials. (Why?) | 


Suppose that the number of people who enter a post office on a given day is a Pois- 
son random variable with parameter 4. Show that if each person who enters the post 
office is a male with probability p and a female with probability 1 — p, then the num- 
ber of males and females entering the post office are independent Poisson random 
variables with respective parameters Ap and A(1 — p). 


Solution Let X and Y denote, respectively, the number of males and females that 
enter the post office. We shall show the independence of X and Y by establish- 
ing Equation (2.2). To obtain an expression for P(X = i, Y = j}, we condition on 
whether or not X + Y =i + j. This gives: 


P(X =i4,¥=fp= P(X =i, Y =j|X + Voit P(X + Y=is+ ff 
+ P(X =i,Y=j|X+V4it+ pPix+YVeic+) 


[Note that this equation is merely a special case of the formula P(E) = P(E|F)P(F) + 
P(E|F°)P(F*).] 
Since P(X =i, Y = j|X + Y #i + j}isclearly 0, we obtain 


PIXSLY SASPiY SLY SX + FST + PPR Se Y sie 63) 


Now, because X + Y is the total number of people who enter the post office, it 
follows, by assumption, that 


na 


PIX + Y=i+ fae’? —— 
{ J} Cam 


(2.4) 
Furthermore, given that i + j people do enter the post office, since each person 
entering will be male with probability p, it follows that the probability that exactly 
i of them will be male (and thus j of them female) is just the binomial probability 


; + \eta — py. That is, 
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panny aia + ait aa (| Jota - py (2.5) 


Substituting Equations (2.4) and (2.5) into Equation (2.3) yields 


PIX =i,Y =j)= % eo )e (1 — pye™ ir 
_- — OP na - py 
= rap) —A(—p) [Ad = p)y (2.6) 
i! 7! 
Hence, 
PLX =i) = ew Op) Deke Bet - ew Oe) (2:7) 
j 
and similarly, 
PY =fae*t) AG — py 7 py (2.8) 
Equations (2.6), (2.7), and (2.8) establish the desired result. a 


Example A man and a woman decide to meet at a certain location. If each of them indepen- 
2c dently arrives at a time uniformly distributed between 12 noon and 1 P.M., find the 
probability that the first to arrive has to wait longer than 10 minutes. 


Solution If we let X and Y denote, respectively, the time past 12 that the man and 
the woman arrive, then X and Y are independent random variables, each of which is 
uniformly distributed over (0, 60). The desired probability, PLY + 10 < Y} + P{Y + 
10 < X}, which, by symmetry, equals 2P{X + 10 < Y},is obtained as follows: 


2P{X + 10 < Y}=2 // f(x, y) dx dy 


x+10<y 


=3 i : fx Gofy(y) dedy 


x+10<y 


y-10 714 \2 
4, [ (=) dx dy 


co 10) dy 


=a 
25 


== a 
36 


Our next example presents the oldest problem dealing with geometrical prob- 
abilities. It was first considered and solved by Buffon, a French naturalist of the 
eighteenth century, and is usually referred to as Buffon’s needle problem. 
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Buffon’s needle problem 


A table is ruled with equidistant parallel lines a distance D apart. A needle of length L, 
where L = D, is randomly thrown on the table. What is the probability that the nee- 
dle will intersect one of the lines (the other possibility being that the needle will be 
completely contained in the strip between two lines)? 


Solution Let us determine the position of the needle by specifying (1) the distance X 
from the middle point of the needle to the nearest parallel line and (2) the angle 6 
between the needle and the projected line of length X. (See Figure 6.2.) The needle 
will intersect a line if the hypotenuse of the right triangle in Figure 6.2 is less than 
L/2—that is, if 


xX L L 
<— or X < =cosé 
cos@ 2 2 
xX) 4 
Figure 6.2 


As X varies between 0 and D/2 and 6 between 0 and 7/2, it is reasonable to assume 
that they are independent, uniformly distributed random variables over these respec- 
tive ranges. Hence, 


L 
P{x < | = // fx (x) fo (y) dx dy 


x<L/2cosy 


4 m/2 pL/2cosy 
= — dx d 
of ‘i aay 


4 pren . 
-=/ go 


2L 
xD 


Characterization of the normal distribution 


Let X and Y denote the horizontal and vertical miss distances when a bullet is fired 
at a target, and assume that 


1. X and Y are independent continuous random variables having differentiable 
density functions. 
2. The joint density f(x,y) = fx(@)fy(y) of X and Y depends on (x, y) only 
through x? + y?. 
Loosely put, assumption 2 states that the probability of the bullet landing on any 
point of the xy plane depends only on the distance of the point from the target and 
not on its angle of orientation. An equivalent way of phrasing this assumption is to 
say that the joint density function is rotation invariant. 
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It is a rather interesting fact that assumptions 1 and 2 imply that X and Y are 
normally distributed random variables. To prove this, note first that the assumptions 
yield the relation 


fy) = fxOofy(y) = ga? + y’) (2.9) 
for some function g. Differentiating Equation (2.9) with respect to x yields 
fx Of (y) = 2xg'(? + y*) (2.10) 
Dividing Equation (2.10) by Equation (2.9) gives 


fy) _ 2xg'? + y*) 
fx) g(x? + y?) 


or 
fy) _ 8 @ +") 
Qxfx(x) g(x? + y) 


(2.11) 


Because the value of the left-hand side of Equation (2.11) depends only on x, 
whereas the value of the right-hand side depends on x” + y”, it follows that the left- 
hand side must be the same for all x. To see this, consider any x1, x2 and let y;, yz be 
such that a + yy = x5 + y3. Then, from Equation (2.11), we obtain 


fyoa) _ 8 OT + yD _ 803 +93) _ _ flor) 
2xifx) gap ty) gi +95) 2x2 fx (x2) 


Hence, 
fy) d 
= — l = 
a. Fz OB Lx) = ex 
which implies, upon integration of both sides, that 
ee 2 
logfy(x) =a + a fx(x) = ke® /? 


Since c. fx(x) dx = 1, it follows that c is necessarily negative, and we may write 
c = —1/o”. Thus, 
fa (x) = ke =P” 


That is, X is a normal random variable with parameters 1 = 0 and o?. A similar 
argument can be applied to fy(y) to show that 


1 2 9x2 
YO) = jane” 
Iv V20o 
Furthermore, it follows from assumption 2 that 0? = G” and that X and Y are thus 


independent, identically distributed normal random variables with parameters 4. = 0 
and o?. fi 


A necessary and sufficient condition for the random variables X and Y to be 
independent is for their joint probability density function (or joint probability mass 
function in the discrete case) f(x, y) to factor into two terms, one depending only on 
x and the other depending only on y. 
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Proposition The continuous (discrete) random variables X and Y are independent if and only if 
2.1 their joint probability density (mass) function can be expressed as 


fxy(% y) = h(x) gy) —00 < xX < 00,—-00 < y < co 


Proof Let us give the proof in the continuous case. First, note that independence 
implies that the joint density is the product of the marginal densities of X and Y, so 
the preceding factorization will hold when the random variables are independent. 
Now, suppose that 


fyxy (x, y) = h(x) g(y) 
Then 


be / / free 


= / h(x) dx i g(y) dy 


=C1CQ 
where C; = [° h(x) dx and C) = f°. g(y) dy. Also, 


fx(x) = / fx,y@, y) dy = C2 h(x) 
fro) = / fxy(, y) dx = C; g(y) 


Since C;C2 = 1, it follows that 


fxy@ y) = fx @fry) 


and the proof is complete. 


Example If the joint density function of X and Y is 
2f 
f(x,y) = 6e7*e9Y 0<x<w,0<y<o 
and is equal to 0 outside this region, are the random variables independent? What if 
the joint density function is 


f(x, y) = 24xy O0O<x<1,0<y<1,0<x+4+y<1 


and is equal to 0 otherwise? 


Solution In the first instance, the joint density function factors, and thus the random 
variables, are independent (with one being exponential with rate 2 and the other 
exponential with rate 3). In the second instance, because the region in which the 
joint density is nonzero cannot be expressed in the form x € A,y € B, the joint 
density does not factor, so the random variables are not independent. This can be 
seen clearly by letting 


jl fO<x<1,0<y<1,0<x+y<1 
i= 0 otherwise 


and writing 
f(x, y) = 24xy I(x, y) 
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which clearly does not factor into a part depending only on x and another depending 


only on y. O 

The concept of independence may, of course, be defined for more than two 
random variables. In general, the n random variables X1,X2,..., Xn are said to be 
independent if, for all sets of real numbers Aj, Az,...,An, 


n 
P(X, € Ay, Xp € Az,...,Xn € An} =| | P(X € Ai} 
i=1 


As before, it can be shown that this condition is equivalent to 


PLX, S a, X27 = a,...,Xn = ay} 


n 
| [2% <aj;} for allaj,ao,...,an 
i=1 


Finally, we say that an infinite collection of random variables is independent if every 
finite subcollection of them is independent. 


How can a computer choose a random subset? 


Most computers are able to generate the value of, or simulate, a uniform (0, 1) 
random variable by means of a built-in subroutine that (to a high degree of approxi- 
mation) produces such “random numbers.” As a result, it is quite easy for a computer 
to simulate an indicator (that is, a Bernoulli) random variable. Suppose / is an indi- 
cator variable such that 


P{Ul=1}=p=1 — P{l=0} 
The computer can simulate J by choosing a uniform (0, 1) random number U and 
then letting 
io 1ifU <p 
~ 0 if U= Dp 


Suppose that we are interested in having the computer select k,k =< n, of the num- 
n 


k 


to be chosen. We now present a method that will enable the computer to solve this 
task. To generate such a subset, we will first simulate, in sequence, n indicator vari- 
ables 1, o,...,In, of which exactly k will equal 1. Those i for which J; = 1 will then 
constitute the desired subset. 


bers 1,2,...,1n such a way that each of the subsets of size k is equally likely 


To generate the random variables /;,...,J,, start by simulating n independent 
uniform (0, 1) random variables U1, U2,..., Un. Now define 
k 
1 fU - 
i = : ‘= n 
0 otherwise 
and then, once /;,...,/; are determined, recursively set 
k-(d poh tae Je 
1 ifUna < cee a 
fii = n—-t 


0 otherwise 
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In words, at the (i + 1)th stage, we set Jj,1 equal to 1 (and thus put 7 + 1 into 
the desired subset) with a probability equal to the remaining number of places in 


i 

the subset | namely,k — }° J; ], divided by the remaining number of possibilities 
j=l 

(namely, n — 1). Hence, the joint distribution of J,, I2,...,Z, is determined from 


k 
Pth = =— 


Piz, = 1h,..., fit = 


1l<i<n 

The proof that the preceding formula results in all subsets of size k being equally 
likely to be chosen is by induction on k + n. It is immediate when k + n = 2 (that 
is, when k = 1,n = 1), so assume it to be true whenever k + n S |. Now, suppose 
that k + n=/ + 1, and consider any subset of size k—say, ij S iz7 = --- S ix—and 
consider the following two cases. 


Case l: i; =1 


PU, =f, =--- = 4, =1,5 = 0 otherwise} 
= Pt =1jPU, =--- =f, =1,/; = 0 otherwise|/; = 1} 


Now given that /; = 1, the remaining elements of the subset are chosen as if a 
subset of size k — 1 were to be chosen from the n — 1 elements 2,3,...,”. Hence, by 
the induction hypothesis, the conditional probability that this will result in a given 


subset of size k — 1 being selected is 1/ ‘ 7 ; . Hence, 


Pi =f, =--- = Si, = 1,1; = 0 otherwise} 
_k 1 ol 
“fn-1 n 
k-1 k 
Case 2: i, #1 
PU, =f, =--- =f, = 1, J; = 0 otherwise} 
= PU, =--- =], =1,4 =0 otherwise|J, = 0}P{J, = 0} 


1 


Ey! I) 


where the induction hypothesis was used to evaluate the preceding conditional prob- 
ability. 
Thus, in all cases, the probability that a given subset of size k will be the subset 


cnoxnis1/ (1). Oo 


Remark The foregoing method for generating a random subset has a very low 
memory requirement. A faster algorithm that requires somewhat more memory is 
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presented in Section 10.1. (The latter algorithm uses the last k elements of a random 
permutation of 1,2,...,7.) |_| 


Let X, Y, Z be independent and uniformly distributed over (0, 1). Compute 
P{X = YZ}. 


Solution Since 


fxy,z@, y, 2) = fx) fy) fz(z) 
=1, 05%x*51,05y510857z2=1 


we have 


P(X = YZ} = / / fx, vy, z0e,y, 2) dedydz 


x=y Fé 


1 pl pl 
=f of [ axavaz 
0 JO Jyz 

1 pl 
=| [a-vavaz 
0 JO 

1 

z 

a 1 — =) dz 
f(-3) 


Probabilistic interpretation of half-life 


Let N(t) denote the number of nuclei contained in a radioactive mass of material at 
time t. The concept of half-life is often defined in a deterministic fashion by stating 
this it is an empirical fact that, for some value h, called the half-life, 


N() =2-“"NO) t>0 


[Note that N(h) = N(0)/2.] Since the preceding implies that, for any nonnegative s 
and t¢, 
N(t + s) = 27-S+9/AN(O) = 2-"/*N(s) 


it follows that no matter how much time s has already elapsed, in an additional time 
t, the number of existing nuclei will decrease by the factor 2~“/”. 

Because the deterministic relationship just given results from observations of 
radioactive masses containing huge numbers of nuclei, it would seem that it might 
be consistent with a probabilistic interpretation. The clue to deriving the appropriate 
probability model for half-life resides in the empirical observation that the propor- 
tion of decay in any time interval depends neither on the total number of nuclei at 
the beginning of the interval nor on the location of this interval [since N(t + s)/N(s) 
depends neither on N(s) nor ons]. Thus, it appears that the individual nuclei act inde- 
pendently and with a memoryless life distribution. Consequently, since the unique 
life distribution that is memoryless is the exponential distribution, and since exactly 
one-half of a given amount of mass decays every / time units, we propose the fol- 
lowing probabilistic model for radioactive decay. 


Probabilistic interpretation of the half-life h: The lifetimes of the individual nuclei 
are independent random variables having a life distribution that is exponential with 
median equal to h. That is, if L represents the lifetime of a given nucleus, then 
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PiL<H=1=2-" 


(Because P{L < h}= 5 and the preceding can be written as 


P{L < t}=1 —- exp {2 | 
it can be seen that L indeed has an exponential distribution with median h.) 

Note that under the probabilistic interpretation of half-life just given, if one 
starts with N(0) nuclei at time 0, then N(t), the number of nuclei that remain at 
time ¢ will have a binomial distribution with parameters n = N(0) and p = 2~“/ he 
Results of Chapter 8 will show that this interpretation of half-life is consistent with 
the deterministic model when considering the proportion of a large number of nuclei 
that decay over a given time frame. However, the difference between the determinis- 
tic and probabilistic interpretation becomes apparent when one considers the actual 
number of decayed nuclei. We will now indicate this with regard to the question of 
whether protons decay. 

There is some controversy over whether or not protons decay. Indeed, one the- 
ory predicts that protons should decay with a half-life of about h = 10°° years. To 
check this prediction empirically, it has been suggested that one follow a large num- 
ber of protons for, say, one or two years and determine whether any of them decay 
within that period. (Clearly, it would not be feasible to follow a mass of protons for 
10°° years to see whether one-half of it decays.) Let us suppose that we are able to 
keep track of N(0) = 10°° protons for c years. The number of decays predicted by 
the deterministic model would then be given by 


N(0) — N(c) =A — 27-¢/*) 


{= 2-c/h 
~ Th 
1-2°% 1 
= lim ————_ since — = 10°79 = 0 
x—>0 x h 


= lim (c2-“ log2) by L-H6pital’s rule 
x0 
= clog2 © .6931c 


For instance, the deterministic model predicts that in 2 years there should be 1.3863 
decays, and it would thus appear to be a serious blow to the hypothesis that protons 
decay with a half-life of 10°° years if no decays are observed over those 2 years. 

Let us now contrast the conclusions just drawn with those obtained from the 
probabilistic model. Again, let us consider the hypothesis that the half-life of 
protons is h = 10°° years, and suppose that we follow h protons for c years. Since 
there is a huge number of independent protons, each of which will have a very small 
probability of decaying within this time period, it follows that the number of protons 
that decay will have (to a very strong approximation) a Poisson distribution with 
parameter equal to h(1 — 2~°/") = clog2. Thus, 


P{0 decays} = e~°!°2? 


c 1 
= e7 log _— mr 
and, in general, 
~T clog 2]" 


2 
P{n decays} = 
n! 
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Thus, we see that even though the average number of decays over 2 years is (as 
predicted by the deterministic model) 1.3863, there is 1 chance in 4 that there will 
not be any decays, thereby indicating that such a result in no way invalidates the 
original hypothesis of proton decay. a 


Remark Independence is a symmetric relation. The random variables X and Y are 
independent if their joint density function (or mass function in the discrete case) is 
the product of their individual density (or mass) functions. Therefore, to say that X is 
independent of Y is equivalent to saying that Y is independent of X — or just that X 
and Y are independent. As a result, in considering whether X is independent of Y in 
situations where it is not at all intuitive that knowing the value of Y will not change 
the probabilities concerning X, it can be beneficial to interchange the roles of X 
and Y and ask instead whether Y is independent of X. The next example illustrates 
this point. | 


If the initial throw of the dice in the game of craps results in the sum of the dice 
equaling 4, then the player will continue to throw the dice until the sum is either 4 
or 7 If this sum is 4, then the player wins, and if it is 7 then the player loses. Let N 
denote the number of throws needed until either 4 or 7 appears, and let XY denote the 
value (either 4 or 7) of the final throw. Is N independent of X? That is, does knowing 
which of 4 or 7 occurs first affect the distribution of the number of throws needed 
until that number appears? Most people do not find the answer to this question to 
be intuitively obvious. However, suppose that we turn it around and ask whether X 
is independent of N. That is, does knowing how many throws it takes to obtain a 
sum of either 4 or 7 affect the probability that that sum is equal to 4? For instance, 
suppose we know that it takes n throws of the dice to obtain a sum of either 4 or 
7 Does this affect the probability distribution of the final sum? Clearly not, since 
all that is important is that its value is either 4 or 7, and the fact that none of the 
first n — 1 throws were either 4 or 7 does not change the probabilities for the nth 
throw. Thus, we can conclude that X is independent of N, or equivalently, that N is 
independent of X. 

As another example, let X1, X2,... be a sequence of independent and identically 
distributed continuous random variables, and suppose that we observe these random 
variables in sequence. If X, > X; for eachi = 1,...,n — 1, then we say that X,, is 
a record value. That is, each random variable that is larger than all those preceding 
it is called a record value. Let A, denote the event that X;, is a record value. Is Ay,+1 
independent of A,? That is, does knowing that the nth random variable is the largest 
of the first n change the probability that the (7 + 1) random variable is the largest 
of the first n + 1? While it is true that A,,1 is independent of A, this may not be 
intuitively obvious. However, if we turn the question around and ask whether A,, is 
independent of A,,+1, then the result is more easily understood. For knowing that 
the (n + 1) value is larger than X1,...,X, clearly gives us no information about 
the relative size of X;, among the first m random variables. Indeed, by symmetry, it is 
clear that each of these n random variables is equally likely to be the largest of this 
set, So P(An|Ay41) = P(An) = 1/n. Hence, we can conclude that A, and A,,; are 
independent events. 


Remark It follows from the identity 


Pi = Gi,....Xn = ay} 
= PLX, S ayyP{X. S |X, S ay} ++ P{Xn S ay|X1 S ,...,Xn-1 = An-1} 
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that the independence of X1,..., X, can be established sequentially. That is, we can 
show that these random variables are independent by showing that 


X2 isindependent of X, 
X3 isindependent of X;,X2 
X4 isindependent of X1,X2, X3 


Xn isindependent of X1,...,Xn-1 


6.3 Sums of Independent Random Variables 


It is often important to be able to calculate the distribution of X¥ + Y from the 
distributions of X and Y when X and Y are independent. Suppose that X and Y are 
independent, continuous random variables having probability density functions fy 
and fy. The cumulative distribution function of X + Y is obtained as follows: 


Fysy(a) = P(X + ¥ <a} 
= / fx )fy(y) dx dy 


X+ySa 


=| [_ txfroyaray 


oo pa-y 
=I f(x) dxfy(y) dy 


—oo J —0O 


= Fx(a — y)fy(y) dy (3.1) 


The cumulative distribution function Fy+y is called the convolution of the distribu- 
tions Fy and Fy (the cumulative distribution functions of X and Y, respectively). 

By differentiating Equation (3.1), we find that the probability density function 
fx+y of X + Y is given by 


d [o-e) 
feevi) = =f Fy(a — yfvordy 
od 
=| “F(a — y)fv(v)dy 
_oo da 
= i fx(a — y)fy(y) dy (3.2) 


6.3.1 Identically Distributed Uniform Random Variables 


It is not difficult to determine the density function of the sum of two independent 
uniform (0, 1) random variables. 


Example Sum of two independent uniform random variables 


= If X and Y are independent random variables, both uniformly distributed on (0, 1), 


calculate the probability density of X + Y. 
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Solution From Equation (3.2), since 


10<a<l1 
0 otherwise 


fx(@ = fy@ = | 
we obtain 
fx+y(@) = [ tx — y)dy 
For 0 <= a < 1, this yields 
fxty@ = [ dy=a 


For 1 < a < 2, we get 


1 
fexv(a) = | ay=2-a 


Hence, 
a Osxsaz=l 
fxay¥@M=;42-a 1<a<2 
0 otherwise 


Because of the shape of its density function (see Figure 6.3), the random variable 


X + Y is said to have a triangular distribution. | 
Now, suppose that X1, X2,..., X» are independent uniform (0, 1) random variables, 
and let 


Fy(x) = P{iXy +... + X, = x} 


Whereas a general formula for F,(x) is messy, it has a particularly nice form when 
x = 1. Indeed, we now use mathematical induction to prove that 


F,x~=x"/n!, Osxsl 
Because the proceeding equation is true for m = 1, assume that 


Fr1a=x*'/n-D!, O=xe1 


f@) 


0 1 2 


Figure 6.3 Triangular density function. 
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Now, writing 
n n—-1 
x= D+ % 
i=1 i=1 


and using the fact that the X; are all nonnegative, we see from Equation 3.1 that, for 
Osx <1, 


1 
om [ Fy1(x — y)fx, (dy 


al x 
= aD! [ (x — y)""!dy by the induction hypothesis 
1 


-—— | w"ldw (byw =x — y) 


= x"/n! 


which completes the proof. 

For an interesting application of the preceding formula, let us use it to determine 
the expected number of independent uniform (0,1) random variables that need to 
be summed to exceed 1. That is, with X;, X2,... being independent uniform (0, 1) 
random variables, we want to determine E[N], where 


N=min{n:X, +... + X; > 1} 
Noting that N is greater thann > O if and only if X; + ... + X, = 1, we see that 
PIN > ny=F,0)=1/n!, n>0 


Because 
P{N > 0} =1=1/0! 


we see that, forn > 0, 


P{N=n}=P{N > n — 1} — P{N > n}= = 
(n — 1)! n! n! 
Therefore, 


-—1 
FIN= 
n=1 : 


= if 
~ LGD 
=e€ 


That is, the mean number of independent uniform (0, 1) random variables that must 
be summed for the sum to exceed 1 is equal to e. 
6.3.2 Gamma Random Variables 


Recall that a gamma random variable has a density of the form 


_ ier Oar 
fy= Fa 


0 <y< oo 
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An important property of this family of distributions is that for a fixed value of A, it 
is closed under convolutions. 


Proposition If X and Y are independent gamma random variables with respective parameters 
3.1 (s,A) and (¢,A), then X + Y isa gamma random variable with parameters (s + 1,d). 


Proof Using Equation (3.2), we obtain 
1 ne ee 4. = 
2 O)= Fare i de TA(a — yy Te yy dy 
a 
= Ke f ie = yy ty ldy 


1 
= aa | a — x)'x! dx by letting x = cs 
0 a 


= Ce gstt-1 


where C is a constant that does not depend on a. But, as the preceding is a density 
function and thus must integrate to 1, the value of C is determined, and we have 


rere (Aa)s+t-1 


fxty@ = iG =H 


Hence, the result is proved. 


It is now a simple matter to establish, by using Proposition 3.1 and induction, 
that if Xj,i = 1,...,n are independent gamma random variables with respective 


n n 

parameters (¢;,A4),i = 1,...,, then )> X; is gamma with parameters | )° ¢;,A ]. We 
i=1 i=1 

leave the proof of this statement as an exercise. 


Example Let X1, X2,..., Xn be m independent exponential random variables, each having 
3b parameter 4. Then, since an exponential random variable with parameter A is the 
same as a gamma random variable with parameters (1, A), it follows from Proposi- 
tion 3.1 that X,; + X2 + --- + X, is a gamma random variable with parameters 
(n, A). | 
If Z1,Z2,...,Z, are independent standard normal random variables, then Y = 
n 
> Z? is said to have the chi-squared (sometimes seen as x7) distribution with n 
i=1 
degrees of freedom. Let us compute the density function of Y. When n = 1, Y = Z?, 


and from Example 7b of Chapter 5, we see that its probability density function is 
given by 


ore 5 RUD + fal—VWI] 


Se ee 


2/SY /2n 
se 9/2 (y/2)'/2-1 


Jt 


But we recognize the preceding as the gamma distribution with parameters (3. 3). 


[A by-product of this analysis is that T (4) = ,/m.] But since each Z? is gamma 
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Proposition 
3.2 


(3. 1), it follows from Proposition 3.1 that the chi-squared distribution with n degrees 


of freedom is just the gamma distribution with parameters (n /2, 1) and hence has a 
probability density function given by 


se? y 
2 
nN 


e7/2 n/2—1 
= ner eee y>0O 


n 
2Qn/27 [ — 
(3) 


When vis an even integer, [(/2) = [(n/2) — 1]!, whereas when n is odd, ['(n/2) can 
be obtained from iterating the relationship (4) = (¢ — 1)I(t — 1) and then using 
the previously obtained result that T° (3) = ,/n. [For instance, I (3) = 3r (3) = 


ii (1) = ive) 

In practice, the chi-squared distribution often arises as the distribution of the 
square of the error involved when one attempts to hit a target in n-dimensional space 
when the coordinate errors are taken to be independent standard normal random 
variables. It is also important in statistical analysis. 


n/2-1 


fyro)= 


— 
°(5) 


6.3.3. Normal Random Variables 


We can also use Equation (3.2) to prove the following important result about normal 
random variables. 


If Xi,i = 1,...,n, are independent random variables that are normally distributed 
n 
with respective parameters Ui,07, i = 1,...,n, then )> X; is normally distributed 
A 
n n : 
with parameters > jj and 7 o?. 
i=1 i=1 


Proof of Proposition 3.2: To begin, let XY and Y be independent normal random 
variables with Y having mean 0 and variance o? and Y having mean 0 and variance 
1. We will determine the density function of X + Y by utilizing Equation (3.2). 
Now, with 

1 1 1+? 
~ 202 2° 20? 


we have 


fx(a — y)fyo) = 
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26. uh 
= -,. Now, 


where the preceding follows because : 
l+o o 


ae 2 2ya ae a 2 a 
wa + ely etl —.) Fstageye 
ae a ‘ a’ 
~ De? - c(y 1+ =) 202(1 + o?) 


ms (t- a) +e ee) 
— Cc — ——_———ae, 
202 1 + 02 y 1+ 02 


Hence, 
2 


74 
2(1 + o?) 


1 
fx(a — y)fyy) = 5 exp{ } exp{—c(y 
ITO 


1+ 02 


From Equation (3.2), we obtain that 


ae 


_1 a od 
fxsy(a) = a exp{ aa + Py! ox c(y me a }dy 


: exp{ a if exp{ 21d 
= —cx*} dx 
ne Oo de 
aw 
= Cexp{———___ 
exp{ x Lea 


where C does not depend on a. But this implies that ¥ + Y is normal with mean 0 
and variance 1 + o?. 

Now, suppose that X; and X2 are independent normal random variables with X; 
having mean ;2; and variance a?,i = 1,2. Then 


X{ - Xd - 
Xi + e=a(= ala ei 2) + as + Ha 


02 02 


But since (X1 — 41)/o2 is normal with mean 0 and variance o: joe. and (X2 — 2)/o2 
is normal with mean 0 and variance 1, it follows from our previous result that (XX, — 
[44)/02 + (Xo — [2)/o2 is normal with mean 0 and variance 1 + o; jaz, implying 
that _X; + X2 is normal with mean jy; + jz and variance os(1 + O:/00) — op + Ge 

Thus, Proposition 3.2 is established when n = 2. The general case now follows by 
induction. That is, assume that Proposition 3.2 is true when there aren — 1 random 
variables. Now consider the case of n, and write 


n n—-1 
> X= SX + Xn 
i=1 i=1 
n-1 n—-1 n-1 
By the induction hypothesis, 7 Xj is normal with mean * 4; and variance 7 o?. 
i=l i=1 i=1 


n n 
Therefore, by the result for 1 = 2, )> Xj is normal with mean )° yj; and variance 
i=1 i=1 


Il 
aa 
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Example A basketball team will play a 44-game season. Twenty-six of these games are against 
3c class A teams and 18 are against class B teams. Suppose that the team will win each 
game against a class A team with probability .4 and will win each game against a 
class B team with probability .7 Suppose also that the results of the different games 

are independent. Approximate the probability that 

(a) the team wins 25 games or more; 
(b) the team wins more games against class A teams than it does against class B 
teams. 


Solution (a) Let X4 and Xz respectively denote the number of games the team wins 
against class A and against class B teams. Note that X4 and Xg are independent 
binomial random variables and 


E[X,4] = 26(.4) = 10.4 Var(X 4) = 26(.4)(.6) = 6.24 

E[Xp] = 18(.7) = 12.6 Var(Xg) = 18(.7)(.3) = 3.78 
By the normal approximation to the binomial, X,4 and Xg will have 
approximately the same distribution as would independent normal random variables 
with the preceding expected values and variances. Hence, by Proposition 3.2, X4 + 


XB will have approximately a normal distribution with mean 23 and variance 10.02. 
Therefore, letting Z denote a standard normal random variable, we have 


PIX, 4 Xp SI a Py eS Xe SAS) 


_pl[Xat Xe —- 2B _ 245-2 
7 /10.02 10.02 


1.5 
x=PRZ= 
| ta | 


1 — P{Z < 4739} 


2 


= .3178 


(b) We note that X¥4 — Xz will have approximately a normal distribution with 
mean —2.2 and variance 10.02. Hence, 


PO = 36 Sais = HS) 


_ p[ Xa Xe +22. 5422 
7 /10.02 er slits: 


2 
x= PIiZ= 
| J ira 


1 — P{Z < .8530} 
~ 1968 


ru 


Therefore, there is approximately a 31.78 percent chance that the team will win at 
least 25 games and approximately a 19.68 percent chance that it will win more games 
against class A teams than against class B teams. a 
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The random variable Y is said to be a lognormal random variable with param- 
eters jz and o if log (Y) is a normal random variable with mean jz and variance o”. 


That is, Y is lognormal if it can be expressed as 
Y =e* 
where X is a normal random variable. 


Example Starting at some fixed time, let S(m) denote the price of a certain security at the 
3d end of n additional weeks, n = 1. A popular model for the evolution of these prices 
assumes that the price ratios S(n)/S(n — 1),n = 1, are independent and identically 
distributed lognormal random variables. Assuming this model, with parameters pu = 

.0165,0 = .0730, what is the probability that 


(a) the price of the security increases over each of the next two weeks? 
(b) the price at the end of two weeks is higher than it is today? 


Solution Let Z be a standard normal random variable. To solve part (a), we use the 
fact that log(x) increases in x to conclude that x > 1 if and only if log(x) > log(1) = 0. 
As a result, we have 


_ —.0165 
~ 0730 
= P{Z < .2260} 
= 5894 


In other words, the probability that the price is up after one week is .5894. Since the 
successive price ratios are independent, the probability that the price increases over 
each of the next two weeks is (.5894)? = .3474. 

To solve part (b), we reason as follows: 


SQ) 4 f SQ SM 
rt S0 ‘ 1} = PL 0 i} 


S(2 S( 
= P fie (G2) + log (5) | 


However, log (53) + log (Sa); 


variables with a common mean .0165 and a common standard deviation .0730, is a 
normal random variable with mean .0330 and variance 2(.0730)*. Consequently, 


S(2) —.0330 
P{—— > 1} =P)Z > ——X 
ko . | ~ ‘07302 | 


= P{Z < .31965} 


being the sum of two independent normal random 


= .6254 a 
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Example 
3e 


Example 
3f 


6.3.4 Poisson and Binomial Random Variables 


Rather than attempt to derive a general expression for the distribution of X + Y in 
the discrete case, we shall consider some examples. 


Sums of independent Poisson random variables 


If X and Y are independent Poisson random variables with respective parameters A; 
and Az, compute the distribution of X + Y. 


Solution Because the event {X + Y =n} may be written as the union of the disjoint 
events {X¥ =k, Y =n — k},0 = k = n,we have 


nh 
PIX + Y=an}=) P(X =k,Y=n — k} 
k=0 


n 
=) P(X =hP(Y =n — kj 
k=0 
n k n—-k 
= yie% ieee Ay 
ia k! (n — k)! 


n kyn—-k 
= e AitAg) > AyA9 
ra k\(n — k)! 


_ eee n} ykyu-k 
n! kin = ijl 1? 
k=0 

e7 A1tA2) 

= — + A)” 

ny 


Thus, X + X has a Poisson distribution with parameter A; + Ap. a 


Sums of independent binomial random variables 


Let X and Y be independent binomial random variables with respective parameters 
(n, p) and (m, p). Calculate the distribution of X + Y. 


Solution Recalling the interpretation of a binomial random variable, and without 
any computation at all, we can immediately conclude that ¥ + Y is binomial with 
parameters (” + m, p). This follows because X represents the number of successes in 
n independent trials, each of which results in a success with probability p; similarly, 
Y represents the number of successes in m independent trials, each of which results 
in a success with probability p. Hence, given that X and Y are assumed independent, 
it follows that XY + Y represents the number of successes in n + m independent 
trials when each trial has a probability p of resulting in a success. Therefore, X¥ + Y 
is a binomial random variable with parameters (n + m,p). To check this conclusion 
analytically, note that 


n 
PX = Yeo) PX SLY ak = i} 
i=0 


= Ax = i}P{Y =k — i} 
i=0 
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n 
_ n in-i m k-i_ ~m—k+i 
-(1 )o' Ga : 

i= 


where g = 1 — p and where ( = 0whenj < 0. Thus, 


r 
J 
n ‘ if 
_ — ok nt+m—k 
meronanoo3(1)(.2,) 
i= 


and the conclusion follows upon application of the combinatorial identity 
n+m ” n m 
i= 


6.4 Conditional Distributions: Discrete Case 


Recall that for any two events E and F, the conditional probability of E given F is 
defined, provided that P(F) > 0, by 


P(EF 
PEF) = 


Hence, if X and Y are discrete random variables, it is natural to define the condi- 
tional probability mass function of X given that Y = y, by 


pxiyQly) = P(X = x|Y = y} 
_P{x =x,Y=y} 
~ P{Y=y} 
_ PY) 

Py(y) 


for all values of y such that py(y) > 0. Similarly, the conditional probability distri- 
bution function of X given that Y = y is defined, for all y such that py(y) > 0, by 


Fxiy(xly) = P{X = x|Y = y} 
=) pxiv(aly) 


asx 


In other words, the definitions are exactly the same as in the unconditional case, 
except that everything is now conditional on the event that Y = y. If X is indepen- 
dent of Y, then the conditional mass function and the distribution function are the 


same as the respective unconditional ones. This follows because if X is independent 
of Y, then 
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PxiyQly) = P{X =x|Y¥ = y} 
_ Px =x,Y=y} 
~ Pay} 

_ PIX =x}P{Y = y} 
= P{Y =y} 


Example Suppose that p(x, y), the joint probability mass function of X and Y, is given by 
ee 70,0) =4 pO)=2 pdO=1 phy=3 
Calculate the conditional probability mass function of X given that Y = 1. 


Solution We first note that 


py(l) = >) p(x,1) = pO,1) + pd) =.5 


Hence, 
p(0,1) 2 
(0 1) = — ie 
Px\y | pea 5 
and (1) 3 
Pp > 
(1j1) = ae a 
Pxy(| py 5 


Example If X and Y are independent Poisson random variables with respective parameters 1 
4b and Az, calculate the conditional distribution of X given that ¥ + Y=n. 


Solution We calculate the conditional probability mass function of X given that X + 
Y = nas follows: 


P\X=kX+Y= 
Ax shes Yone” ese MeL 


PX + Y=n} 
_PIX=k,Y=n—- hj 
~ PIX + Y=n} 

PIX =kKhPiY=n—k 
~ PIX + Y=n} 


where the last equality follows from the assumed independence of X and Y. Recall- 
ing (Example 3e) that ¥ + Y has a Poisson distribution with parameter A, + Az, we 
see that the preceding equals 


P(X =k|X + Y=n}= 


k! (n—k)! 
n! oS 
(nm — ky KY Gy + 22)" 
“(t)(Pa) Ga) 
NK] NAL + 2 Ay + Ag 
In other words, the conditional distribution of X given that XY + Y = nis the bino- 
mial distribution with parameters m and A4/(A, + Az). oO 


-1 
eMate ag ne at 
n! 


Example 
4c 


Example 
4d 
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We can also talk about joint conditional distributions, as is indicated in the next 
two examples. 


Consider the multinomial distribution with joint probability mass function 


> n! n Nk 
Bt Soe oe gag ni = 0: nan 


Such a mass function results when n independent trials are performed, with each 
trial resulting in outcome i with probability pj, > Dp; = 1. The random variables 
Xj,i=1,...,k, represent, respectively, the number of trials that result in outcome i, 
i = 1,...,k. Suppose we are given that nj; of the trials resulted in outcome j, for 
j=re41,...,k, where ae nj = m = n. Then, because each of the other n — 
m trials must have resulted in one of the outcomes 1,...,7r, it would seem that the 
conditional distribution of X1,...,X; 1s the multinomial distribution on n — m trials 
with respective trial outcome probabilities 


P{outcome ijoutcome is not any ofr + 1,...,k} = —,i=1,...,r 
where F, = )-'_, pi is the probability that a trial results in one of the outcomes 
1,...57r. 
Solution To verify this intuition, let 7,...,;, be such that )°7_, nj =n — m. Then 
P(X, =11,...,Xp = Ny |X = Mypy1,--- Xk = Nk} 


_ Pith =m,.. Xe = 4) 
P{Xp41 = Mpy1,--- Xk = Mk} 


n! ny Ny Mr+1 Nk 


= n} n—m Mrs] NK 
(n—m)!ny4y lng!” 7 r+1 Px 


where the probability in the denominator was obtained by regarding outcomes 
1,...,r.as a single outcome having probability F,, thus showing that the probability 
is a multinomial probability on 7 trials with outcome probabilities F,, p,-+1,...,Dk- 
Because )-)_; nj =n — m, the preceding can be written as 


P(X, = m,..., Xp = My |X = Mpg, ... X~ = Ng} 


_ (n= m)! (pi\" (Pr) 
~ nyl~-enl UE F, 


and our intuition is upheld. a 


Consider n independent trials, with each trial being a success with probability p. 
Given a total of k successes, show that all possible orderings of the k successes and 
n — k failures are equally likely. 


Solution We want to show that given a total of k successes, each of the (7) possible 
orderings of k successes and n — k failures is equally likely. Let XY denote the number 
of successes, and consider any ordering of k successes and n — k failures, say, 0 = 
(s,...,5,f,...,f). Then 
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POX =k = eters 
_ Po) 
PX=h 
ped — py* 
GPK = pyr’ 
1 


_) 


6.5 Conditional Distributions: Continuous Case 


If X and Y have a joint probability density function f(x,y), then the conditional 
probability density function of X given that Y = y is defined, for all values of y such 
that fy(y) > 0, by 

fOY) 

fry) 


To motivate this definition, multiply the left-hand side by dx and the right-hand side 
by (dx dy)/dy to obtain 


fxiy @ly) = 


f(x,y) dx dy 
fy) dy 
tes X ae dy = ¥ = y+ dy} 
7 Ply=Y=y + dy} 
=P{ix =X =x+4+dxjy= Y=y+4 dy} 


fxiy Gly) dx = 


In other words, for small values of dx and dy, fy)y (x|y)dx represents the conditional 
probability that X is between x and x + dx given that Y is between y and y + dy. 
The use of conditional densities allows us to define conditional probabilities of 
events associated with one random variable when we are given the value of a second 
random variable. That is, if X and Y are jointly continuous, then, for any set A, 


P(X € AIY = y} = [ Fv Cry 


In particular, by letting A = (—co,a) we can define the conditional cumulative dis- 
tribution function of X given that Y = y by 


Fxy(aly) = P(x salY=y} = fxy (ly) dx 


The reader should note that by using the ideas presented in the preceding discussion, 
we have been able to give workable expressions for conditional probabilities, even 
though the event on which we are conditioning (namely, the event {Y = y}) has 
probability 0. 

If X and Y are independent continuous random variables, the conditional den- 
sity of X given that Y = y is just the unconditional density of X. This is so because, 
in the independent case, 


fOoy) _ fk@OfyO) 
fy) fry) 


fey @ly) = = fx (x) 
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Example The joint density of X and Y is given by 
a8 2x2 —x — y) O<x<10<y<1 
fy) = 
0 otherwise 
Compute the conditional density of X given that Y = y, where 0 < y < 1. 


Solution For 0 < x < 1,0 < y < 1, we have 


f@y) 
xy @ly) = 
PAO Ge) 
_ _ f@y) 
{Cf Os y) ax 
_ *@=—-x=—y) 
fu x@ —x-— y)dx 
2 =x = 9) 
3 — y/2 
6x(2 — x — 
7 x(2 — x — y) - 
4 — 3y 
Example Suppose that the joint density of XY and Y is given by 
5b e7*/V e-Y 
——- 0<x<wIO0<y<ow 
f@y) = y 
0 otherwise 
Find P{X > 1|Y = y}. 
Solution We first obtain the conditional density of X given that Y = y. 
f@y) 
fxyy Gly) = z—— 
| fv) 
e~*/Ve-Y Jy 
~ eY fy /ye*/¥ dx 
= 1 sly 
y 
Hence, 
maa 
PIX > 1Y=y} =i =e */) dy 
1 sy 
= —e-x/y|~ 
1 
=e l/ | 


Example The t-distribution 
= If Z and Y are independent, with Z having a standard normal distribution and Y hav- 


ing a chi-squared distribution with 1 degrees of freedom, then the random variable 
T defined by 
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Example 
5d 


is said to have a t-distribution with n degrees of freedom. As will be seen in Section 78, 
the t-distribution has important applications in statistical inference. At present, we 
will content ourselves with computing its density function. This will be accomplished 
by using the conditional density of T given Y to obtain the joint density function of T 
and Y, from which we will then obtain the marginal density of T. To begin, note that 
because of the independence of Z and Y, it follows that the conditional distribution 
of T given that Y = y is the distribution of ./n/y Z, which is normal with mean 0 and 
variance n/y. Hence, the conditional density of T given that Y = y is 


1 
friy(ly) = ee a eee ee eee 


J2mn/y 


Using the preceding, along with the following formula for the chi-squared density 
given in Example 3b of this chapter, 


e-¥/2yn/2-1 
= Fnl2 P/O? 0 
fy) IPT aD y> 
we obtain that the joint density of 7, Y is 
1 2 
T.YCY) = eo y/2n p—y/2,(n-1)/2 
i ; y Fan 2n/2 T(n/2) y 
1 Pin 
= — a Y yln-D)/2 _ 
~ /an2nt)/2 T'(n/2)- my , y> 0, -co <1 < oo 


; 2 , : ; : 
Letting c= cee and integrating the preceding over all y, gives 


fro = | fr.y(t,y) dy 


1 . —cy ,(n—1)/2 
= d 
Jmn2+d/2 P(n/2) [ oe y 


oT mt)/2 
Jan 2°+D/2 T(n/2) Jo 


H/2 1 
nint/2 p (251) ( 1 2n 


en Fx "Di? dy (by letting x = cy) 


because — = 


SAA + mmr? P(g) c oP +n 
r (4) 2 —(n+1)/2 
= ( + , co <t< ow | 
n n 
/Tn T (3) 


The bivariate normal distribution 


One of the most important joint distributions is the bivariate normal distribution. 
We say that the random variables X,Y have a bivariate normal distribution if, for 
constants jx, y,0, > 0,0, > 0,—-1 < p < 1, their joint density function is given, 
for all —co < x,y < oo, by 
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fey) ; ex : ( = ‘) 
x,y) = 
: 2rox0yV1 — p* P 2(1 — p?) Ox 


2 
rn (: - ) apf = HY = by) 
Oy 


OxOy 


We now determine the conditional density of X given that Y = y. In doing so, we 
will continually collect all factors that do not depend on x and represent them by the 
constants C;. The final constant will then be found by using that Ea fxiy@ly) dx = 1. 
We have 


f(x,y) 
XxX = 
fryy Cly) Fr) 
= Cif, y) 
1 xX — px . x(y — py) 
=C 2 
oe 21 =p) ( Ox ) OxOy 
C3 ex : ’ 2. + ng ) 
= x x —(y - 
3 p 202(1 _ 2) Ux Poy y My 
2 
C4ex : Ap ) 
— pa —_— = 
4 p 202(1 = p2) Ux Poy y My 


Recognizing the preceding equation as a normal density, we can conclude that given 
Y = y, the random variable X is normally distributed with mean yy + pe — fy) 


and variance o7(1 — p*). Also, because the joint density of Y, X is exactly the same 
as that of X,Y, except that ,,0, are interchanged with jy, oy, it similarly follows 
that the conditional distribution of Y given X = x is the normal distribution with 
mean [ly + p22 — jx) and variance oy (1 — p’). It follows from these results that 
the necessary and sufficient condition for the bivariate normal random variables X 
and Y to be independent is that p = 0 (a result that also follows directly from their 
joint density, because it is only when p = 0 that the joint density factors into two 
terms, one depending only on x and the other only on y). 
With C = ——_!—., the marginal density of X can be obtained from 


21 oxayr/ 1—p? 


pee / f(x,y) dy 


2 
= = 1 X — bx . y — by 
=cf ex 20 — p?) ( Or ) +( oy 


op f= HY = 2] iy 


OxOy 
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: y-py 
Now, with w = — 
Aa 


é = 4 (es = Myy2 2p(x — Ux)(y — Hy) 


Ox Oy OxOy 
= 2 ~ 
yam ee P(x — [x)W 
Ox Ox 
P(X — fx) x— pL 
== y #0 =p) 


x 


Hence, making the change of variable w = i yields that 


—(x— py)? /202 ” 1 P(X — Mx).9 
fix 0) = Caye Obs) es [. =e 20 = p?) Sis Ox “aa 
co 2 _ 
= Coye OH) Pas iz oe ed by letting v=w—- eee 


= Ken bx)? /20¢ 


where K does not depend on x. But this shows that X is normal with mean px 
and variance o?. Similarly, we can show that Y is normal with mean jy and variance 


2 
oy. a 


We can also talk about conditional distributions when the random variables are 
neither jointly continuous nor jointly discrete. For example, suppose that X is a con- 
tinuous random variable having probability density function f and N is a discrete 
random variable, and consider the conditional distribution of X given that N = n. 
Then 


Plx < X <x+dx|N=n} 


dx 
_PiN=n|x < X <x + dy} Pix < X <x + dx} 
7 P{N =n} dx 


and letting dx approach 0 gives 


na Pix < X <x+dx|N=n}  P{N=n|X =x} 
dx—>0 dx 7 P{N =n} 


f@) 


thus showing that the conditional density of X given that N = n is given by 


P{N =n|X = x} 


PIN =n) f@) 


fxin@|n) = 


Consider n + m trials having a common probability of success. Suppose, however, 
that this success probability is not fixed in advance but is chosen from a uniform 
(0, 1) population. What is the conditional distribution of the success probability 
given that the n + m trials result in 1 successes? 


Solution If we let X denote the probability that a given trial is a success, then X 
is a uniform (0, 1) random variable. Also, given that X = x, the n + m trials are 
independent with common probability of success x, so N, the number of successes, 
is a binomial random variable with parameters (n + m,x). Hence, the conditional 
density of X given that N = nis 
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P(N = nlX = x}fx() 


fx\n(x|n) = PIN =n} 
4 . ")a — x)m 
= O<x <1 
P{N =n} 


=cx"(1 — xy” 


where c does not depend on x. Thus, the conditional density is that of a beta random 
variable with parameters n + 1,m + 1. 

The preceding result is quite interesting, for it states that if the original or prior 
(to the collection of data) distribution of a trial success probability is uniformly 
distributed over (0, 1) [or, equivalently, is beta with parameters (1, 1)], then the 
posterior (or conditional) distribution given a total of n successes inn + m1 trials is 
beta with parameters (1 + n,1 + m). This is valuable, for it enhances our intuition 
as to what it means to assume that a random variable has a beta distribution. O 


We are often interested in the conditional distribution of a random variable X 
given that X lies in some set A. When_X is discrete, the conditional probability mass 
function is given by 


P(X=x,XeA)_ | pQey, if xe A 


P(X € A) 0, if x¢A 


P(X =x|X € A) = 


Similarly, when _X is continuous with density function f, the conditional density func- 
tion of X given that X € A is 


fe _ fe) 
P(XeEA) fy, fQ)dy’ 


fxixea(*) = 


A Pareto random variable with positive parameters a, i has distribution function 
F(x)=1- ax", x>a 


and density function 


f@= wet, xa 


An important feature of Pareto distributions is that for x) > a the conditional distri- 
bution of a Pareto random variable X with parameters a and A, given that it exceeds 
Xo, is the Pareto distribution with parameters xo and i. This follows because 


f(x) _ ha®x1 


P{X > xo} ax," 


= Coe ma x > XO 


fxix>x9(%) = 


thus verifying that the conditional distribution is Pareto with parameters xo and A. 
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“6.6 Order Statistics 


Let X1, X2,..., Xn be n independent and identically distributed continuous random 
variables having a common density f and distribution function F. Define 


X(1) = smallest of X;, X2, ..., Xn 
X (2) = second smallest of X;, X2, ..., Xn 


X(j) = jth smallest of X1, X2, ..., Xn 


Xn) = largest of X1, X, ..., Xn 


The ordered values Xq) = X(2) = --- = Xm) are known as the order statistics corre- 
sponding to the random variables X1, X2,..., Xn. In other words, X(1),...,X(n) are 
the ordered values of X1,...,Xn. 

The joint density function of the order statistics is obtained by noting that the 
order statistics X(1),...,X() will take on the values xj = x7 = --- = x, if and only 
if, for some permutation (ij, i2,...,in) of (1,2,...,7), 


XxX = Xji,,X2 = Kips An = Xi, 


Since, for any permutation (i,,...,i,) of (1,2,...,”), 
Eg € € € 
P {i = 5 < Xi < Xip + Bara - 5 < Xy, < xi, + 5 


EG cc aan sda) 
e"f (Xi,) ++ Fi) 
= e"f (x1) --- fn) 


it follows that, for xy < x9 < +--+ < Xp, 
E g g € 
Pf = 5 < Xa) < x4 + aerekn ais < Xin) < Xn + ;| 
~ nl e"f (x1) ---f On) 
Dividing by e” and letting e—0 yields 
‘ner aly Xn 1X2, +++ >Xn) = nif (x4) -- -fOn) Xp < X2 < 1+ < Xy (6.1) 


Equation (6.1) is most simply explained by arguing that, in order for the vector 
(X(1),---,X()) to equal (x1,...,Xn), it is necessary and sufficient for (X1,...,Xn) 
to equal one of the n! permutations of (x1,...,X,). Since the probability (density) 
that (X1,...,Xn) equals any given permutation of (x1,...,Xn) is just f(x1)---f(n), 
Equation (6.1) follows. 
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Along a road 1 mile long are 3 people “distributed at random.” Find the probability 


that no 2 people are less than a distance of d miles apart when d = 5 


Solution Let us assume that “distributed at random” means that the positions of the 
3 people are independent and uniformly distributed over the road. If X; denotes the 
position of the ith person, then the desired probability is PLX(j) > XG) + Gi= 
2,3}. Because 


Fx ay,X oy) Xe@ O1.%2.%3) = 3! O< xy < x2 < 43 < 1 


it follows that 


P{Xw > Xi-1 + G1=2,3}= iff FX ay,XqX a 1, X2,x3) dx dx dx3 
Xj>xj1td 


1-2d pl-d el 
= 3! i i) / dx3 dx2 dx, 
0 xy+d Jx2+d 


1—2d pi—-d 
=6| / (1 — d — x2) dx2 dx, 
0 x 


itd 


109. 51S 
=6 / i y2 dy2 dx, 
0 0 


where we have made the change of variables y2 = 1 — d — x2. Continuing the string 
of equalities yields 


1—2d 
= i (1 — 2d — x1)" dxy 
0 


Hence, the desired probability that no 2 people are within a distance d of each 
other when 3 people are uniformly and independently distributed over an interval 
of size 1 is (1 — 2d)? when d < 5 In fact, the same method can be used to prove 
that when n people are distributed at random over the unit interval, the desired 
probability is 

[1 — (nm — 1)d]" when d = ——— 
n—-1 
The proof is left as an exercise. a 


The density function of the jth-order statistic X(j) can be obtained either by inte- 
grating the joint density function (6.1) or by direct reasoning as follows: In order for 
Xj) to equal x, it is necessary for; — 1 of the n values Xj,...,X» to be less than 
x,n — j of them to be greater than x, and 1 of them to equal x. Now, the probability 
density that any given set of j — 1 of the X;’s are less than x, another given set of 
n — jare all greater than x, and the remaining value is equal to x equals 


[Foy fl — Foo} fa) 


Hence, since there are 


n _ n!\ 
j=1Le=-j]1)° f= pg Dy! 
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different partitions of the n random variables X1,...,X, into the preceding three 
groups, it follows that the density function of X(j) is given by 


fx) = [Foy [L — Foo}"Tf@ (6.2) 


n! 
Ge= gig = 1)! 


When a sample of 2n + 1 random variables (that is, when 2” + 1 independent and 
identically distributed random variables) is observed, the (n + 1) smallest is called 
the sample median. If a sample of size 3 from a uniform distribution over (0, 1) is 
observed, find the probability that the sample median is between : and 2. 


Solution From Equation (6.2), the density of X (2) is given by 


3! 
Ixo)@) = im x1 — x) 0O<x<l 
Hence, 
1 3 3/4 
Pl; < Xi) < ;{=8 x(1 — x)dx 
4 4 1/4 
x=3/4 
x2 x 11 
= — = | 
2 3 16 
x=1/4 


The cumulative distribution function of X(j can be found by integrating Equa- 
tion (6.2). That is, 


is [Fy — Fey fojde 63) 


n! 
0G g= 


However, F'x,, (y) could also have been derived directly by noting that the jth order 
statistic is less than or equal to y if and only if there are j or more of the X;’s that are 
less than or equal to y. Thus, because the number of X;’s that are less than or equal 
to y is a binomial random variable with parameters n, p = F(y), it follows that 


Fx, (v) = P{X = y} = Pty or more of the X;’s are = y} 


= ->(i )t [Foy — Fon)" (6.4) 


If, in Equations (6.3) and (6.4), we take F to be the uniform (0, 1) distribution 
[that is, f(x) = 1,0 < x < 1], then we obtain the interesting analytical identity 


= n k hath n! y A a 


k=j 
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We can employ the same type of argument that we used in establishing Equation 
(6.2) to find the joint density of X(j) and X(j, the i” and j” smallest of the values 
X1,...,Xn. For suppose that i < j and x; < x;. Then the event that X(j) = x;, X(j) = x; 
is equivalent to the event that the nm data values can be divided into 5 groups of 
respective sizesi — 1,1,7 — i — 1,1,n — j, that satisfy the condition that alli — 1 
members of the first group have values less than x;, the one member of the second 
group has value x;, all 7 — i — 1 members of the third group have values between x; 
and x;, the one member of the fourth group has value x;, and all n — j members of the 
last group have values greater than x;. Now, for any specified division of the n values 
into 5 such groups, the preceding condition will hold with probability (density) 


Flo) (Foy) — Fad)! fo (0 — Fo"? 


As there are GDN = Danny such divisions of the n values, and as the condition 
cannot hold for more than one of these divisions, it follows, fori < j, xj < x;, that 


fx, X@ DX) = (6.6) 


! j ee : 
a Flap fadlFa) — Foy“ pop. — Fo)" 


@-Dig-i-VYia—-/j/ 


Distribution of the range of a random sample 


Suppose that 7 independent and identically distributed random variables X;, X2,..., 
Xy are observed. The random variable R defined by R = X(,) — X 1) is called the 
range of the observed random variables. If the random variables Xj; have distribution 
function F and density function f, then the distribution of R can be obtained from 
Equation (6.6) as follows: For a = 0, 


P{R S a}=P{Xiny — Xa) = 4} 


= ; Lane ee (X1,Xn) dx, dXy 
Xn—-X{Sa 


xXy{+a n!\ 


= / / (n = 2! [F(xn) F(x)" ff On) dXp dx 
~oo J x1 ! 


Making the change of variable y = F(x) — F(x1),dy = f(xn) dxn yields 
F(x, +a)—F(x1) 


Xi +a 
/ [F(n) — Poa fay) dXpn = if y" dy 


XY 
1 
=> qlFe + @) — Fo)y"" 
Thus, 


PR= iH = nf [Fo + a) — Fex))" fe) dx (6.7) 


Equation (6.7) can be evaluated explicitly only in a few special cases. One such case 
is when the X;’s are all uniformly distributed on (0, 1). In this case, we obtain, from 
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Equation (6.7), that for0 < a < 1, 
1 
P{R < a}= nf [F(xy + a) — F(x)" fe) dx, 
0 


l-a 1 
nf a” dey + nf C= ay! dy 
0 WE 


=4 


=n(1 — aja"! + a" 
Differentiation yields the density function of the range: given in this case by 


_ Jaa - Ha’? = 42 Ofa<1 
Ir@ = | 0 otherwise 
That is, the range of m independent uniform (0, 1) random variables is a beta random 
variable with parameters n — 1,2. i 


6.7 Joint Probability Distribution of Functions of Random Variables 


Let X, and X2 be jointly continuous random variables with joint probability density 
function fx,_x,. It is sometimes necessary to obtain the joint distribution of the ran- 
dom variables Y; and Y, which arise as functions of X; and_X3. Specifically, suppose 
that Y; = g1(X4, X2) and Y2 = g2(X1, X2) for some functions g; and go. 

Assume that the functions g; and go satisfy the following conditions: 

1. The equations y; = gj (x1,x2) and yz = go(x1,x2) can be uniquely solved for x; 
and x2 in terms of y; and y2, with solutions given by, say, x; = 41(1, y2),x2 = 
h2(y1, y2)- 

2. The functions g; and gz have continuous partial derivatives at all points (x1, x2) 
and are such that the 2 x 2 determinant 


agi agi 
Ox; 0x2 | _ 0g10g2 Ogi O82 
dg2 08 Ox 0x2 OXx2 OX] 
dx, x2 


#0 


J(x1,x2) = 


at all points (x1, x2). 


Under these two conditions, it can be shown that the random variables Y; and 
Y2 are jointly continuous with joint density function given by 


fri ¥o 1.92) = fry x9 1,2) 1, x2)! (71) 


where x1 = h1(y1, y2), x2 = h2(1, y2). 
A proof of Equation (71) would proceed along the following lines: 


P(Y1 = y1, Yo = y2}= // Fx1,X(%1, X2) dxy dx2 (72) 


(X1,X2): 
81 (41,42) = 
82(%1,%2) = y2 


The joint density function can now be obtained by differentiating Equation (72) with 
respect to y; and y. That the result of this differentiation will be equal to the right- 
hand side of Equation (71) is an exercise in advanced calculus whose proof will not 
be presented in this book. 
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Example Let X1 and X2 be jointly continuous random variables with probability density func- 
Ta tion fy,,x,. Let Yj = X1 + X2, Y2 = X1 — X. Find the joint density function of Y, 
and Y2 in terms of fy, x5. 


Solution Let gj (x1,x2) =x, + x2 and go(xj1,x2) =x, — x2. Then 


2 


1 


Also, since the equations yj = x; + x2 and yz = x, — x2 have x; = (y, + y2)/2, x2 = 
(v1 — y2)/2 as their solution, it follows from Equation (71) that the desired density is 


i y+y2 v1 - y2 
fr,,¥.01, 2) = 5 fx, Xs ( 3 


For instance, if X; and X2 are independent uniform (0, 1) random variables, then 


5 OS y+y52,05 yj —-yw S2 
fy,.¥%2 01.92) = . 
0 otherwise 


or if X; and_X2 are independent exponential random variables with respective param- 
eters 41 and Aj, then 


f¥1,.¥215 2) 


Aya = 
_ Sex| n (45) (25) yy + yo 20, v1 — yo 2 O 


0 otherwise 


Finally, if X; and X2 are independent standard normal random variables, then 


fy, Y> (1, ¥2) = 1 -tortys)/8+01-y2)?/8] 


An 

= 1 ott 
4a 

_~ 1 ati 
V4 V4Ar 


Figure 6.4 e = Random point. (X, Y) = (R, @). 
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Thus, not only do we obtain (in agreement with Proposition 3.2) that both X; + X2 
and X, — X> are normal with mean 0 and variance 2, but we also conclude that these 
two random variables are independent. (In fact, it can be shown that if X; and X> 
are independent random variables having a common distribution function F, then 
X, + X2 will be independent of X; — X2 if and only if F is a normal distribution 
function.) ai 


Let (X, Y) denote a random point in the plane, and assume that the rectangular 
coordinates X and Y are independent standard normal random variables. We are 
interested in the joint distribution of R, ©, the polar coordinate representation of 
(x, y). (See Figure 6.4.) 

Suppose first that XY and Y are both positive. For x and y positive, letting r = 


g(x,y) = J/x2 + y2 and 6 = go(x,y) = tan! y/x, we see that 


081 _ x 
ax x2 + y2 
081 y 


oy [2 y2 


Og. _ 1 (=) _ 7 
ox 14 (y/x)? x2 J x2 + y2 
go 1 x 


dy xfl + G/x?] 2 + y? 


Hence, 

x 4. y’ = 1 _# 
G2 + y?)3/2 9 GP + yA)3/2 erp yer 

Because the conditional joint density function of X, Y given that they are both 
positive is 


J(x,y) = 


fy) _2 ee +y?)/2 


oY SHe _ 
Te oA) reo ¥ Se 


,x >0,y >0 


we see that the conditional joint density function of R = /X? + Y? and © = 
tan-!(Y/X), given that X and Y are both positive, is 


2 
f7,O|X > 0,Y > 0) = a ew 0<dA0<7/2, 0<r<ow 
Tv 
Similarly, we can show that 
2 _2p 
r, < 0,Y > 0)=-—,re , w/2<0<n, <r<o 
f(r,0|X < 0,Y > 0) r /2 <0 0 
54 
2 _p 
f(7r,0|\X <0,¥ <0)=—re*™ , a2 <0 < 3nr/2, 0<r<o 
4 


2 2 
f(7,0|X > 0,Y <0) = —rer” 37/2 <0 <2n7, O0<r<o 
a 


As the joint density is an equally weighted average of these four conditional joint 
densities, we obtain that the joint density of R, © is given by 


1 
f(r,0) = re 0 < @ < 2n, 0<r<o 
a 
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Now, this joint density factors into the marginal densities for R and ©, so R and 0 
are independent random variables, with © being uniformly distributed over (0,27) 
and R having the Rayleigh distribution with density 


fnN= re? /? 0<r<o 


(For instance, when one is aiming at a target in the plane, if the horizontal and verti- 
cal miss distances are independent standard normals, then the absolute value of the 
error has the preceding Rayleigh distribution.) 

This result is quite interesting, for it certainly is not evident a priori that a ran- 
dom vector whose coordinates are independent standard normal random variables 
will have an angle of orientation that not only is uniformly distributed, but also is 
independent of the vector’s distance from the origin. 

If we wanted the joint distribution of R? and ©, then, since the transformation 
d= g\(x,y) =x* + y? and@ = go(x,y) = tan! y/x has the Jacobian 


2x 2y 
= —y x =2 
ety + x2 


it follows that 
1 1 
f(d,0) = gr 0<d<~o, 0<0 <2 
2 20 


Therefore, R* and © are independent, with R* having an exponential distribution 
with parameter i But because R* = X? + Y?, it follows by definition that R? has 
a chi-squared distribution with 2 degrees of freedom. Hence, we have a verification 
of the result that the exponential distribution with parameter 5 is the same as the 
chi-squared distribution with 2 degrees of freedom. 

The preceding result can be used to simulate (or generate) normal random vari- 
ables by making a suitable transformation on uniform random variables. Let U; and 
U2 be independent random variables, each uniformly distributed over (0, 1). We will 
transform U,, U2 into two independent standard normal random variables X; and 
X, by first considering the polar coordinate representation (R,@) of the random 
vector (X1, X2). From the preceding, R? and © will be independent, and, in addi- 
tion, R? = be + bes will have an exponential distribution with parameter 7 = 5 
But —2 log U; has such a distribution, since, for x > 0, 


P{—2log U, < x)= P {log hh -5| 
= P{U; > e*/*} 
= e*/2 


Also, because 27 U2 is a uniform (0,27) random variable, we can use it to gener- 
ate ©. That is, if we let 


R? = —2log U; 
@ =2nU 


then R? can be taken to be the square of the distance from the origin and @ can be 
taken to be the angle of orientation of (X 1, X2). Now, since X,; = Rceos0,X2 = 


296 Chapter 6 Jointly Distributed Random Variables 


Example 
Tc 


Rsin ©, it follows that 


X1 = /—2 log U; cos(2z U2) 
X2 = /—2 log U; sin(2z U2) 


are independent standard normal random variables. | 


If X and Y are independent gamma random variables with parameters (a, 4) and 
(8,4), respectively, compute the joint density of U= X¥ + YandV=X/(X + Y). 


Solution The joint density of X and Y is given by 


12 Oa se Oye 


fxy@y) = 


Pa) P(B) 
= ere gee lye 
P(a)P(B) 
Now, if g1(@%,y) =x + y,g2(%,y) =x/(x + y), then 
dg _ 081 _, O82 _ Og. x 
ox dy x (x + ys? Oy (ww +)? 
so 
1 1 1 
J (x, y) = ¥ Xx == 


(x + y)*? (+ y) 


Finally, as the equations u = x + y,v = x/(x + y) have as their solutions x = uv, 
y=u(1 — v), we see that 


fuv,v) = fx,y[uv,ud. — v)Ju 
re auyetP-t yaa — vyP-lr(@ + B) 
Ta + B) P(@)P(B) 
Hence, X + Y and X/(X + Y) are independent, with X + Y having a gamma dis- 
tribution with parameters (a + 8,A) and X/(X + Y) having a beta distribution with 


parameters (a, $8). The preceding reasoning also shows that B(a, 8), the normalizing 
factor in the beta density, is such that 


1 
B(a, B) =| vl = villa 
0 


_ T@r(p) 
~ T@ + B) 


This entire result is quite interesting. For suppose there are n + m jobs to be per- 
formed, each (independently) taking an exponential amount of time with rate i to 
be completed and suppose that we have two workers to perform these jobs. Worker 
I will do jobs 1,2,...,, and worker II will do the remaining m jobs. If we let XY and 
Y denote the total working times of workers I and IJ, respectively, then (either from 
the foregoing result or from Example 3b) X and Y will be independent gamma ran- 
dom variables having parameters (n, A) and (m, A), respectively. It then follows that 
independently of the working time needed to complete all n + m jobs (that is, of 


Example 
7d 
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X + Y), the proportion of this work that will be performed by worker I has a beta 


distribution with parameters (n, m). | 
When the joint density function of the n random variables X1, X2,..., Xn 1s given 
and we want to compute the joint density function of Y;, Y2,..., Yn, where 
Y, = 81(%,...,Xn) Yo = g2(X1,...,Xn),--- Yn = 8n(M1,.--, Xn) 


the approach is the same —namely, we assume that the functions g; have continuous 
partial derivatives and that the Jacobian determinant 


Ogi 981 O81 
Ox, Ox2 OXn 
0g2 982 082 
J(x1,.--,Xn) = shi #0 
¢ : n) Ox, 0x2 OXn 
On On . O8n 
Oxy 0x2 OXn 
at all points (x1,...,xX,). Furthermore, we suppose that the equations yy = 
G1(X1,---,Xn),¥2 = B2(X1,.--,Xn),---5V¥n = Sn(%X1,..-,Xn) have a unique solution, 


say, X1 = hy (¥1,..-,Yn),---,Xn = Nn(1,---,¥n). Under these assumptions, the joint 
density function of the random variables Y; is given by 


Pit Oi eouY Sie, Oho XU Oeiea) (73) 
where x; = Aj(y1,.--, Yn), f= 1,2,...,n. 
Let X1, X2, and X3 be independent standard normal random variables. If Yj = X; + 


Xo + X3,Y2 = X, — Xz, and Y3 = X, — X3, compute the joint density function 
of Y;, Yd, Y3. 


Solution Letting Y; = X, + X2 + X3, Y2 = X1 — X20, Y3 = X1 — X3, the Jacobian 
of these transformations is given by 


1 1 1 
J=|]1-1 0/=3 
1 0-1 
As the preceding transformations yield that 
Yi + Yo + Y3 ¥; = 2Y¥o + Y¥3 Yi + Yo - 2Y3 
xX ar 2= X3 = 
3 3 3 
we see from Equation (73) that 
FY 1,¥2,¥3015 Y2. Y3) 
1 yi + yo2 + y3 yi — 2y2 + y3 yi + y2 — 2y3 
= afk. X2,X3 3 > 3 > 3 


Hence, as 


— sa 
Ix,,.X, X51, x2, X3) = Onysr® Die 37/2 
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we see that 
1 
— p91, 92,3) /2 
fy..¥2.¥301 Y2. Y3) = Fanyre* 1.92693 
where 
Q(1, 925.93) 
22) Ag (es a” 5 (Met a8)” 
3 3 3 
2 
_ Y1 2 2 2 2 2 
= 3 1 392 + 3¥3 — 39293 = 


Example Let X1, X2,..., Xn be independent and identically distributed exponential random 
Te variables with rate 4. Let 


Yj=X,+--- + X% i=1,...,n 


(a) Find the joint density function of Y1,..., Yn. 
(b) Use the result of part (a) to find the density of Y;,. 
(c) Find the conditional density of Yj,..., Y,—1 given that Y, = t. 


Solution (a) The Jacobian of the transformations Y; = X1, Yo = X, + X2,..., 
Yn =X +--+ + Xp is 


1 0 0 0 :-: O 
1 1 0 0 ::: O 
J= 1 1 1 O .:-: 0 
1 1 1 1: 1 


Since only the first term of the determinant will be nonzero, we have J = 1. 
Now, the joint density function of X1,...,Xn is given by 


Hence, because the preceding transformations yield 
X= Y1,X2 = Y2 — Y1,...,Xi= Yi — Yi-1,...,Xn= Yn — Yn-1 
it follows from Equation (73) that the joint density function of Yj,..., Yn is 


F¥anc¥n O1e0+.¥n) = fii o%e O15 V2 = Vises Pn — Ya) 


n 


= exp) —Aly1 + DO: — yi) 
i=2 


= Men 0 < y1,0 < yj — yjy1,i=2,...,n 


=e O< yy < yn < ++: < yp 


6.8 
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(b) To obtain the marginal density of Y,, let us integrate out the other variables 
one at a time. Doing this gives 


y2 
fr, geet Yn (V25+++sYn) = i Me*¥n dyy 


=A"yge" 0 < yp < y3 < ++ < Yn 


Continuing, we obtain 


2 
= ue 0 < y3 < ya < ++: < yn 


The next integration yields 


3 
Vs 
FV 4. ¥n (V4r+09Yn) =A" BFE We Oe gS ee SP 


Continuing in this fashion gives 


f = jr ie —)yn 0 
¥n (Yn) = rei < Yn 


which, in agreement with the result obtained in Example 3b, shows that X, + 
--. + X;, isa gamma random variable with parameters n and i. 
(c) The conditional density of Y;,..., Y,1 given that Y, = ris, for0 < y, <...< 
Yn-1 < 4, 
Tico ¥y atte OV isaeey Viste) 
fy, O 
ne et 
~ pe A>tcanr-l/(n — 1)! 
die I 
~ nl 


Fi qys¥n1l¥n Os ++ +9 ¥n—-11D = 


Because f(y) = 1/t,0 < y < t,is the density of a uniform random variable on 
(0, t), it follows that conditional on Y, = t, Yj,..., Y,—1 are distributed as the 
order statistics of m — 1 independent uniform (0, t) random variables. @ 


Exchangeable Random Variables 


The random variables X1, X2,..., Xn are said to be exchangeable if, for every permu- 
tation ij,...,in of the integers 1,...,n, 


PLX{, S x1,Xin, = X2,..., Xi, FS Xn} = PIX, S x1, X27 S XD,...,Xn FS Xn} 


n 


for all x1,...,x,. That is, the n random variables are exchangeable if their joint dis- 
tribution is the same no matter in which order the variables are observed. 
Discrete random variables will be exchangeable if 


P{Xi, = x1, Xi, = Dy eee Gs = Xn} —_ PIX, —s x1,X2 = X2,...,Xn = Xn} 


for all permutations i1,...,i,, and all values x1,...,X,. This is equivalent to stating 
that p(x1,X2,...,%n) = P{X, = x,...,Xn = Xn} is a symmetric function of the 
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Example 
8a 


Example 
8b 


vector (X1,...,n), which means that its value does not change when the values of 
the vector are permuted. 


Suppose that balls are withdrawn one at a time and without replacement from an 
urn that initially contains n balls, of which k are considered special, in such a man- 
ner that each withdrawal is equally likely to be any of the balls that remains in the 
urn at the time. Let X; = 1 if the ith ball withdrawn is special and let X; = 0 oth- 
erwise. We will show that the random variables X1,...,X, are exchangeable. To 
do so, let (x1,...,X,) be a vector consisting of k ones and n — k zeros. However, 
before considering the joint mass function evaluated at (x1,...,n), let us try to gain 
some insight by considering a fixed such vector—for instance, consider the vector 
(1,1,0,1,0,...,0,1), which is assumed to have k ones andn — k zeros. Then 


kk ln kk —2n k 1 11 
nn-1in-—-2n-—-3 n-—-4 21 


pH 1010::...0,1) = 


which follows because the probability that the first ball is special is k/n, the con- 
ditional probability that the next one is special is (k — 1)/(m — 1), the conditional 
probability that the next one is not special is (n — k)/(m — 2), and so on. By the same 
argument, it follows that p(x1,...,Xn) can be expressed as the product of n fractions. 
The successive denominator terms of these fractions will go from n down to 1. The 
numerator term at the location where the vector (xj,...,X,) is 1 for the ith time is 
k — (i — 1), and where it is 0 for the ith time it ism — k — (i — 1). Hence, since 
the vector (x1,...,X,) consists of k ones andn — k zeros, we obtain 


k\(n — k)! 


n 
es a 4 =0,1,) 4k 
i=1 


Since this is a symmetric function of (x1,..., Xn), it follows that the random variables 
are exchangeable. O 


Remark Another way to obtain the preceding formula for the joint probability 
mass function is to regard all the n balls as distinguishable from one another. Then, 
since the outcome of the experiment is an ordering of these balls, it follows that 
there are n! equally likely outcomes. Finally, because the number of outcomes having 
special and nonspecial balls in specified places is equal to the number of ways of 
permuting the special and the nonspecial balls among themselves, namely k!(n — k)!, 
we obtain the preceding mass function. 


It is easily seen that if X1,X2,...,X, are exchangeable, then each Xj; has the 
same probability distribution. For instance, if X and Y are exchangeable discrete 
random variables, then 


Pease) PX ax% a=) Pray, ¥ =o=rr=x4 
y y 


For example, it follows from Example 8a that the ith ball withdrawn will be special 
with probability k/n, which is intuitively clear, since each of the n balls is equally 
likely to be the ith one selected. 


In Example 8a, let Y; denote the selection number of the first special ball withdrawn, 
let Y> denote the additional number of balls that are then withdrawn until the second 
special ball appears, and, in general, let Y; denote the additional number of balls 
withdrawn after the (i — 1) special ball is selected until the ith is selected, i = 1,...,k. 


Example 
8c 
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For instance, ifn = 4,k = 2 and X; = 1,X2 = 0,X3 =0,X4 = 1, then Y; = 1, Y2 = 


3. Now, Y =, Y2 7 | ae =ke Xi, = Xi, +i Se = X jy pe bix = 1, Xj = 0, 
otherwise; thus, from the joint mass function of the X;, we obtain 
k\(n — k)! 
PLY, =, Y2 =b,..., Ye= ik} = ee ytes +a Sn 
Hence, the random variables Y;,..., ¥x are exchangeable. Note that it follows from 


this result that the number of cards one must select from a well-shuffled deck until 
an ace appears has the same distribution as the number of additional cards one must 
select after the first ace appears until the next one does, and so on. a 


The following is known as Polya’s urn model: Suppose that an urn initially con- 
tains n red and m blue balls. At each stage, a ball is randomly chosen, its color is 
noted, and it is then replaced along with another ball of the same color. Let X; = 1 
if the ith ball selected is red and let it equal 0 if the ith ball is blue, i = 1. To 
obtain a feeling for the joint probabilities of these X;, note the following special 
cases: 


P{X, =1,X_ = 1,X3 =0,X4 = 1, X5 = 0} 
n n+1 m n+2 m+1 


n+tmnt+m+itn+m4+2n4+m+3n+m4+4 
_ n(n + 1)(n + 2)m(m + 1) 
n+mjy(n+m+1)nm+m-+2)\n +m + 3)\(n +m 4 4) 


and 


P{X, =0,X2 = 1,X3 =0,X4 = 1,X5 = 1} 
m n m+1 n+1 n+2 
~ntmnt+m+tintmt2ntm+t3ntms+4 
_ n(n + 1)(n + 2)m(m + 1) 
“Atm ntm+tDntm+2Dn+tm+3\n+m + 4) 


By the same reasoning, for any sequence x1,...,x, that contains r ones and k — r 
zeros, we have 


P{X, = x1,...,X~ = Xx} 
n+ Dee + = 2mm + 1)ses Gn + k—r-—1) 
7 (n+ m)---”#+m-+k — 1) 


Therefore, for any value of k, the random variables X1,...,X; are exchangeable. 
An interesting corollary of the exchangeability in this model is that the prob- 
ability that the ith ball selected is red is the same as the probability that the first 
ball selected is red, namely, ;;.,- (For an intuitive argument for this initially non- 
intuitive result, imagine that all the m + m balls initially in the urn are of different 
types. That is, one is a red ball of type 1, one is a red ball of type 2, ..., one is a 
red ball type of n, one is a blue ball of type 1, and so on, down to the blue ball of 
type m. Suppose that when a ball is selected it is replaced along with another of 
its type. Then, by symmetry, the ith ball selected is equally likely to be of any of 
the n + m distinct types. Because n of these n + m types are red, the probability 
is a 


nmi") 
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Our final example deals with continuous random variables that are exchangeable. 


Example Let X1, X2,...,X, be independent uniform (0, 1) random variables, and denote their 
8d order statistics by X(1),...,X(n). That is, Xj) is the jth smallest of X1,X2,...,Xn- 
Also, let 
=X, 
Y; = Xi) = X(i-1), eee 
Show that Yj,..., Y, are exchangeable. 
Solution The transformations 
Yi =%1, Vi =Xi — Xj-1, 1=2,...,n 
yield 
Mi=yp tee ty i=l,...,n 


As it is easy to see that the Jacobian of the preceding transformations is equal to 1, 
so, from Equation (73), we obtain 


Fin Vitae. 


Yn) =fO1.1 + Ya,---5¥1 + + Yn) 


where f is the joint density function of the order statistics. Hence, from Equation (6.1), 


we obtain that 


Piisuc¥n Oe ¥2e:+2 Yn) Hn! Om yf < + ye SS PP ew tb ye = l 
or, equivalently, 
fy, Fores, ¥n(V1>V2.+++.¥n) =n! 0O<y < 1li=1l,...,n, bee > Yas 1 


Because the preceding joint density is a symmetric function of y;,... 


the random variables Y,..., Y, are exchangeable. |_| 


Summary 


The joint cumulative probability distribution function of 
the pair of random variables X and Y is defined by 


F(x,y) = P(X =x,Y <y} 


—o0 < x,y < 


All probabilities regarding the pair can be obtained from 
F. To find the individual probability distribution functions 
of X and Y, use 


F(x) = lim Fony) Fy(y) = lim Foy) 


If X and Y are both discrete random variables, then 
their joint probability mass function is defined by 


, Yn, we see that 


The individual mass functions are 


P(X =i} = Dip i) 


PLY =j}=) pi 
J i 


The random variables X and Y are said to be 
jointly continuous if there is a function f(x, y), called the 
joint probability density function, such that for any two- 
dimensional set C, 


P{(X,Y) € C} = [ [Fon aeay 
Cc 


It follows from the preceding formula that 


Pix < X <x+dx,y < Y < y+ dy} &® f(x,y) dxdy 


If X and Y are jointly continuous, then they are individu- 
ally continuous with density functions 


AO= a faydy fro) = [ foy) dx 


The random variables X and Y are independent if, for 
all sets A and B, 


P{X € A,Y € B} = PIX € A}P{Y € B} 


If the joint distribution function (or the joint probability 
mass function in the discrete case, or the joint density func- 
tion in the continuous case) factors into a part depending 
only on x and a part depending only on y, then X and Y 
are independent. 


In general, the random variables X1,...,X, are inde- 
pendent if, for all sets of real numbers Aj,..., An, 
P{X, € Aj,...,Xn € An} = P{X, € Ay} ---P{Xy, € An} 


If X and Y are independent continuous random vari- 
ables, then the distribution function of their sum can be 
obtained from the identity 


Fy+y(a) =f Fx(a — y)fy()dy 


If Xj,i = 1,...,n, are independent normal ran- 
dom variables with respective parameters jz; and Oe a 


n n 
1,...,n, then 5° Xj is normal with parameters }> yw; and 
i=l i=l 


Problems 


6.1. A coin is tossed three times. Find the joint probability 
mass function of X and Y when 


(a) X is the number of heads in all three tosses, and Y is 
the number of tails; 

(b) X is the number of heads on the first two tosses, and Y 
is the number of heads on all three tosses; 

(c) X is the absolute difference between the number of 
heads and the number of tails in all three tosses, and Y 
is the number of tails. 


6.2. Suppose that 3 balls are chosen without replacement 
from an urn consisting of 5 white and 8 red balls. Let X; 
equal 1 if the ith ball selected is white, and let it equal 0 
otherwise. Give the joint probability mass function of 


(a) X1, X9; 
(b) X1, X2, X3. 


6.3. In Problem 6.2, suppose that the white balls are num- 
bered, and let Y; equal 1 if the ith white ball is selected and 
0 otherwise. Find the joint probability mass function of 
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If Xj,i = 1,...,n, are independent Poisson random 
variables with respective parameters Ainpi = 1,...,n, then 


3 X; is Poisson with parameter = Aj. 
i=l 

If X and Y are discrete fnncloti variables, then the 
conditional probability mass function of X given that Y = 
y is defined by 


p(x, y) 


PIX =x|Y=y}= 
Py) 


where p is their joint probability mass function. Also, if X 
and Y are jointly continuous with joint density function f, 
then the conditional probability density function of X given 
that Y = y is given by 


fy) 
fry) 


fxyy@ly) = 


The ordered values Xj) = XQ) = --- = Xn) of a set of 
independent and identically distributed random variables 
are called the order statistics of that set. If the random vari- 
ables are continuous and have density function f, then the 
joint density function of the order statistics is 


f1,-++.¥n) = MFO) ++ fn) Xp SxQ S++ [Xn 
The random variables X1,...,X; are called exchangeable 
if the joint distribution of X;,,..., Xj, is the same for every 


In 
permutation ij,...,i, of 1,...,n 


(a) Y1, Yo; 
(b) Y1, Yo, Y3. 


6.4. Repeat Problem 6.2 when the ball selected is replaced 
in the urn before the next selection. 


6.5. Repeat Problem 6.3a when the ball selected is 
replaced in the urn before the next selection. 


6.6. The severity of a certain cancer is designated by one 
of the grades 1,2,3,4 with 1 being the least severe and 4 
the most severe. If X is the score of an initially diagnosed 
patient and Y the score of that patient after three months 
of treatment, hospital data indicates that p(i,j) = P(X = 
i, Y = j) is given by 


p(,1) = .08, p(1,2) = .06, p(1,3) = .04, p(1,4) = .02 
p(2,1) = .06, p(2,2) = .12, p(2,3) = .08, p(2,4) = 

p(3,1) = .03, p(3,2) = .09, p(3,3) = .12, p(3,4) = .06 
p(4,1) = .01, p(4,2) = .03, p(4,3) = .07 p(4,4) = .09 
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(a) Find the probability mass functions of X and of Y; 

(b) Find E[X] and E[Y]. 

(c) Find Var(X) and Var(Y). 

6.7. Consider a sequence of independent Bernoulli trials, 
each of which is a success with probability p. Let X1 be the 
number of failures preceding the first success, and let X2 
be the number of failures between the first two successes. 
Find the joint mass function of X; and X3. 


6.8. The joint probability density function of X and Y is 
given by 


fixy) = cx + y) 


Os=x=10<y<l 


(a) Find c. 
(b) Find the marginal densities of X and Y. 
(c) Find E[XY]. 


6.9. The joint probability density function of X and Y is 
given by 


f@y)=cxe’, OSysxsl 


(a) Find c. 

(b) Find the marginal density of X. 

(c) Find the marginal density of Y. 

(d) Find P{Y < 3|X < 4}. 

(e) Find E[X]. ~ 

(f) Find E[Y]. 

6.10. The joint probability density function of X and Y is 
given by 


fy) =40n2)?2-°7), OS x <105yes1 


Find (a) P(X < a} and (b) P(X + Y < 4). 


6.11. In Example 1d, verify that f(x,y) = 2e-*e~””, 0 < 
x < 0,0 < y < ow, is indeed a joint density. function. 
That is, check that f(x, y) = 0, and that 


Lo LF y)dx dy = 1. 


6.12. The number of claims received by a car insurance 
company in a month is a Poisson random variable with 
mean 20. Seventy percent of policies pertain to vehicle 
type A, and 30 percent of policies pertain to vehicle type 
B. Compute the conditional probability that more than 10 
claims received are for vehicle type A given that at least 5 
of the claims received are for vehicle type B. What assump- 
tions have you made? 


6.13. A man and a woman agree to meet at a certain loca- 
tion about 12:30 P.M. If the man arrives at a time uni- 
formly distributed between 12:15 and 12:45, and if the 
woman independently arrives at a time uniformly dis- 
tributed between 12:00 and 1 P.M., find the probability that 
the first to arrive waits no longer than 5 minutes. What is 
the probability that the man arrives first? 


6.14. An ambulance travels back and forth at a constant 
speed along a road of length L. At a certain moment of 


time, an accident occurs at a point uniformly distributed on 
the road. [That is, the distance of the point from one of the 
fixed ends of the road is uniformly distributed over (0, L).] 
Assuming that the ambulance’s location at the moment of 
the accident is also uniformly distributed, and assuming 
independence of the variables, compute the distribution of 
the distance of the ambulance from the accident. 


6.15. The random vector (X, Y) is said to be uniformly dis- 
tributed over a region R in the plane if, for some constant 
c, its joint density is 


_ je ifayyeR 
fey) = to otherwise 


(a) Show that 1/c = area of region R. 


Suppose that (X, Y) is uniformly distributed over the 
square centered at (0, 0) and with sides of length 2. 

(b) Show that X and Y are independent, with each being 
distributed uniformly over (—1, 1). 

(c) What is the probability that (X, Y) lies in the cir- 
cle of radius 1 centered at the origin? That is, find 
PL ee ye ST), 


6.16. Suppose that 1 points are independently chosen at 
random on the circumference of a circle, and we want the 
probability that they all lie in some semicircle. That is, we 
want the probability that there is a line passing through the 
center of the circle such that all the points are on one side 
of that line, as shown in the following diagram: 


Let Pj,...,P, denote the 1 points. Let A denote the event 
that all the points are contained in some semicircle, and 
let A; be the event that all the points lie in the semi- 
circle beginning at the point P; and going clockwise for 
180° s¢= 1, c25n. 


(a) Express A in terms of the Aj. 


(b) Are the A; mutually exclusive? 
(c) Find P(A). 


6.17. A circle of radius R is divided into four equally 
sized sectors. Pick three independently and uniformly dis- 
tributed points in the circle. What is the probability that 
the three points lie in different sectors? 


6.18. Let X; and X2 be independent Poisson random vari- 
ables with each X; having parameter Aj. Find 


(a) P(X1.X2 = 0); 
(b) P(X, + X2 = 1); 
(c) P(X, + X2 > 1). 


6.19. Show that f(x,y) = —— 0 < y < x < 1, where 
B(3, 2) is the beta function evaluated at 3 and 2, is a joint 
density function for two random variables X and Y. Find 
(a) the marginal density of X; 

(b) the marginal density of Y; 

(c) E[X]; 

(d) E[Y]. 


6.20. The joint density of X and Y is given by 


xe7 ty) 


0 0 
ron= {5 x>O0,y> 


otherwise 


Are X and Y independent? If, instead, f(x, y) were 
given by 


—J2 O<x<y,0<y<l 
[Oa = f otherwise 


would X and Y be independent? 
6.21. Let 


OsxsF,0sysF 


i. 
fy) = 5 sin@ + y), 5 


(a) Show that f(x, y) is a joint probability density function. 
(b) Find E[X]. 
(c) Find E[cos Y]. 


6.22. The joint density function of X and Y is 


O<x<1,0<y<1 
otherwise 


x+y 
fax.y) = 40 


(a) Are X and Y independent? 
(b) Find the density function of X. 
(c) Find P(X + Y < 1}. 


6.23. The random variables X and Y have joint density 
function 


f@y) =12xy1 —x) O0O<x<1,0<y<1l 


and equal to 0 otherwise. 


(a) Are X and Y independent? 
(b) Find FLX]. 
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(c) Find E[Y]. 
(d) Find Var(X). 
(e) Find Var(Y). 


6.24. Consider independent trials, each of which results in 
outcome 0, with probability p, and a non-zero outcome 
with value, with probability 1 — p. Let N represent the 
number of trials needed to obtain a number that is not 
equal to 0, and let Y¥ = 1/N be that outcome. 


(a) Find P(N = n). 

(b) Show that ELX] = (—pInp)/((1 — p)). 
(c) What is probability P(N = n,X = x)? 
(d) What is probability P(N = n|X = x)? 


6.25. Suppose that 10° people arrive at a service station 
at times that are independent random variables, each of 
which is uniformly distributed over (0, 10°). Let N denote 
the number that arrive in the first hour. Find an approxi- 
mation for P{N = i}. 


6.26. Suppose that A, B, C, are independent random vari- 
ables, each being uniformly distributed over (0, 1). 


(a) What is the joint cumulative distribution function of A, 
B,C? 

(b) What is the probability that all of the roots of the equa- 
tion Ax? + Bx + C = Oare real? 


6.27. Let X; and X2 be independent and uniformly dis- 
tributed on [0,1]. Find the cumulative distribution func- 
tion of Z = X1,X> and P{Z > .5}. 


6.28. The time that it takes to service a car is an exponen- 
tial random variable with rate 1. 


(a) If A. J. brings his car in at time 0 and M. J. brings her 
car in at time f¢, what is the probability that M. J.’s car is 
ready before A. J.’s car? (Assume that service times are 
independent and service begins upon arrival of the car.) 
(b) If both cars are brought in at time 0, with work start- 
ing on M. J.’s car only when A. J.’s car has been completely 
serviced, what is the probability that M. J.’s car is ready 
before time 2? 


6.29. The total rain water collected in a reservoir in a 
year is gamma distributed with mean 1000 liters and stan- 
dard deviation 200. Assuming that the rain water collected 
yearly is independent, what is the probability that 


(a) the total rainwater collected by the reservoir in 2 years 
is less than 2500? 


(b) more than average collection of rain water happens in 
at least 3 of the next 5 years? 


6.30. A manufacturing plant uses two machines in two 
stages. The service time (in minutes) of the first machine 
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is normally distributed with mean 1 and standard devia- 
tion .05. Independent of the first machine, the service time 
of the second is normally distributed with mean .95 and 
standard deviation .02. Find the probability that 


(a) the first machine finishes the task faster than the sec- 
ond machine; 


(b) the total service time for both machines is less than 2.1. 


6.31. According to the U.S. National Center for Health 
Statistics, 25.2 percent of males and 23.6 percent of females 
never eat breakfast. Suppose that random samples of 200 
men and 200 women are chosen. Approximate the proba- 
bility that 


(a) at least 110 of these 400 people never eat breakfast; 
(b) the number of the women who never eat breakfast is 
at least as large as the number of the men who never eat 
breakfast. 


6.32. Daily log returns on stock are independent normal 
random variables with mean 0 and standard deviation .01. 


(a) Find the probability that gains are made on each of 4 
consecutive days. 

(b) Find the probability that the total log returns on 4 con- 
secutive days is greater than .02. 


6.33. Let _X;, and X> both be the sum of 10000 Bernoulli 
trials, with the probability of a successful outcome being 
5 Use the normal approximation to the binomial to deter- 
mine which probability is larger 


(a) P(X; < 5000) or P(X, + Xo < 10000); 
(b) P(X; > 5100) or P(X, + Xo > 10300). 


6.34. Suppose X and Y are independent normal random 
variables with parameters (44,01) and (12,02), respec- 
tively. Find x such that PLY — Y > x) =P(X¥ + Y > a) 
for some a. 


6.35. Teams 1,2,3,4 are all scheduled to play each of the 
other teams 10 times. Whenever team i plays team j, team 
iis the winner with probability P;;, where 


Pin = 6, P13 =.7% Pig =.75 
Po, = 4, Po3 = .6, P24 = .70 


(a) Approximate the probability that team 1 wins at least 
20 games. 

Suppose we want to approximate the probability that team 
2 wins at least as many games as does team 1. To do so, let 
X be the number of games that team 2 wins against team 
1, let Y be the total number of games that team 2 wins 
against teams 3 and 4, and let Z be the total number of 
games that team 1 wins against teams 3 and 4. 


(b) Are X, Y, Z independent. 


(c) Express the event that team 2 wins at least as many 
games as does team 1 in terms of the random variables 
X,Y,Z. 

(d) Approximate the probability that team 2 wins at least 
as many games as team 1. 


Hint: Approximate the distribution of any binomial ran- 
dom variable by a normal with the same mean and vari- 
ance. 


6.36. Let X1,..., X19 be independent with the same con- 
tinuous distribution function F, and let m be the median 
of that distribution. That is, F(m) = .5. 


(a) If N is the number of the values X),...,X 19 that are 
less than m, what type of random variable is N. 


(b) Let Xq) < X(2) < --- < X10) be the values _X1,..., X40 
arranged in increasing order. That is, X(j is, for i = 
1,...,10, the i" smallest of X1,..., X19. Find P(XQ) <m < 
X(g)). 


6.37. An experiment is successful with probability .8. 
(a) What is the probability that 2 runs of the experiment 
yield no success? 


(b) What is the probability that 10 runs of the exper- 
iment yield higher-than-average success? Explain your 
reasoning. 


6.38. The number of defects in a piece of fabric is Poisson 
with an average of 2 per square meter. What is 


(a) the probability that there are no defects on one square 
meter of the fabric? 


(b) the probability that there are more than 5 defects on 5 
square meters of the fabric? 


(c) the expected number of defects in 2 square meters of 
fabric given that there is one? 


6.39. In Problem 6.4, calculate the conditional probability 
mass function of X; given that 

(a) X2 = 1; 

(b) X2 = 0. 

6.40. In Problem 6.3, calculate the conditional probability 
mass function of Y; given that 

(a) Y2 = 1; 

(b) Y2 = 0. 


6.41. The discrete integer valued random variables X, Y, Z 
are independent if for all i,j,k 


PIX =i, ¥=j,Z=h =P(X =)P(Y =pP(Z=h 


Show that if X,Y,Z are independent then X and Y are 
independent. That is, show that the preceding implies that 


P(X 


i,Y =j)= P(X =)P(Y =)/) 


6.42. Choose a number X using a standard normal dis- 
tribution. Choose a second number Y using a truncated 
normal distribution on (—oo, X). 

(a) Find the joint density function of X and Y. 

(b) Deduce the marginal density function of Y. 

(c) Confirm that the marginal density in (b) integrates to 1. 
6.43. Let X and Y be, respectively, the smallest and the 
largest values of two uniformly distributed values on (0,1). 


Find the conditional density function of Y given X = x, 
x € (0,1). Also, show that X and Y are not independent. 


6.44. The joint probability mass function of X and Y is 
given by 
p(l,1) =.9, 
p(2,1) = .03, 


p(1,2) = .04 
p(2,2) = .03 


(a) Compute the conditional mass function of X given 
Y=i,7= 1,2. 

(b) Are X and Y independent? 

(c) Compute P(X — Y = 0}, P(X + Y <3}, and Pts <1}. 


6.45. The joint density function of X and Y is given by 


f(x,y) =xe7%OtD xy sO, y>0 


(a) Find the conditional density of X, given Y=y, and that 
of Y, given X = x. 
(b) Find the density function of Z = XY. 


6.46. The joint density of X and Y is 
fy) =exy? O<x<10<y< xa 
Find c and the conditional distribution of X, given Y = y. 


6.47. Packages of different types arriving in a processing 
department have a processing parameter jz, depending on 
type. The service time for processing a package with pro- 
cessing parameter jz is an exponentially distributed service 
rate w. The parameter jz is also assumed to be uniformly 
distributed on (a), a2),a2 > a, > 0. Ifa particular package 
takes more than b,b > 0, to process, find the conditional 
density of the processing parameter. Based on this infor- 
mation, determine the expected value of the processing 
parameter that the next package identical to this one will 
have. 


6.48. Let X; and X2 be independent random variables that 
are exponentially distributed with parameter 4. Compute 
the probability that the largest of the two is twice as great 
as the other one. 


6.49. A complex machine is able to operate effectively as 
long as at least 3 of its 5 motors are functioning. If each 
motor independently functions for a random amount of 
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time with density function f(x) = xe~*,x > 0, compute 
the density function of the length of time that the machine 
functions. 


6.50. If 3 trucks break down at points randomly dis- 
tributed on a road of length L, find the probability that no 
2 of the trucks are within a distance d of each other when 
ds L/2. 


6.51. Consider a sample of size 5 from a uniform distribu- 
tion over (0, 1). Compute the probability that the median 


is in the interval ( 3). 


6.52. Let X1,...,X;, be independent and identically dis- 
tributed geometric random variables with parameter p 
related to the probability of a successful trial. Hence, given 
a positive integer a, find: 


(a) P{min(X},...,X,) = a}; 
(b) P{max(X},...,Xn) = a}. 


6.53. Let X(1),X(2),..-,X(n) be the order statistics of a 
set of n independent uniform (0, 1) random variables. Find 
the conditional distribution of X(,) given that Xq) = 


81,XQ) = 52,...,X(n—1) = Sn-1- 


6.54. Let Z; and Zz be independent standard normal ran- 
dom variables. Given XY = Z; + Zo and Y = Zo — Z, 
show that X and Y are independent normally distributed 
random variables with mean 0 and variance 2. 


6.55. Derive the distribution of the range of a sample of 
size 2 from a distribution having density function f(x) = 
2x,0 <x <1. 


6.56. Let X and Y denote the coordinates of a point uni- 
formly chosen in the circle of radius 1 centered at the ori- 
gin. That is, their joint density is 


1 
fays— ety sl 
1 
Find the joint density function of the polar coordinates 
R= (X*? + Y’)!/? and © = tan! Y/X. 


6.57. If X and Y are independent random variables both 
uniformly distributed over (0, 1), find the joint density 


function of R= /X2 + Y2,@ =tan!Y/X. 


6.58. If U is uniform on (0,27) and Z, independent of U, 
is exponential with rate 1, show directly (without using the 
results of Example 7b) that X and Y defined by 


X = V2ZcosU 
Y =vV2ZsinU 


are independent standard normal random variables. 


6.59. X and Y have joint density function 
1 
fay=asay *x2=1y2!1 
x"y 
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(a) Compute the joint density function of U = XY,V = 
X/Y. 
(b) What are the marginal densities? 


6.60.If X and Y are independent and _ identically 
distributed uniform random variables on (0, 1), 
compute the joint density of 


(a) U=X + Y,V=X/Y; 
(b) U=X,V=X/Y; 
()U=j{X + Y,V=aX/(X + Y). 


6.61. Repeat Problem 6.60 when X and Y are independent 
exponential random variables, each with parameter A = 1. 


6.62. Let X; and X2 be independent exponentially dis- 
tributed random variables with parameters A, and Aj, 
respectively. Find the joint density function of Yj = Xj 
and Yo = X; + X2. 


Theoretical Exercises 


6.1. F(x, y) is the joint density function of random vari- 
ables X and Y. If F(x, y) = F(y,x) what can you say about 
the distributions of the two random variables? 


6.2. Suppose that X and Y are integer valued random 
variables and have a joint distribution function F(i,j) = 
P(X =i,Y =)j). 


(a) Give an expression, in terms of the joint distribution 
function, for P(X =i, Y = j). 
(b) Give an expression, in terms of the joint distribution 
function, for P(X =i, Y = j). 


6.3. Suggest a procedure for using Buffon’s needle prob- 
lem to estimate z. Surprisingly enough, this was once a 
common method of evaluating z. 


6.4. Solve Buffon’s needle problem when L > D. 
2L 

ANSWER: zp" — sin@) + 26/2, where cos@ = D/L. 
1 


6.5. Given continuous independent random variables X 
and Y with probability density functions fy and fy, find the 
density functions and distribution functions for the ran- 
dom variables Z = Xe¥ and W = X(Y? + 1) in terms 
of fy and fy. 


6.6. X and Y are continuous random variables with joint 


density function f(x, y). Show that the density function of 
X — Y is given by 


fx—y (2) =) f(x,x — z)dx 


6.63. Let X, Y, and Z be independent random variables 
having identical distribution functions f(x) = 1 — .5x,0 < 
x < 2. Derive the joint distribution of VU = X¥ + Y, V= 
eY W=X4+Y4 Z. 


k 
6.64. In Example 8b, let Yy43 = n + 1 — YO Y;. Show 


i=1 
that Y;,...,Y¥x, Yx41 are exchangeable. Note that Y;,1 is 
the number of balls one must observe to obtain a special 
ball if one considers the balls in their reverse order of with- 
drawal. 


6.65. Consider an urn containing n balls numbered 
1,...,n, and suppose that & of them are randomly with- 
drawn. Let X; equal 1 if ball number 7 is removed and let 
X; be 0 otherwise. Show that X1,...,X; are exchangeable. 


Show that independence implies the equation 


CO 


Fy-y(z)=1 - / Fy(x — z)fx(x) dx 


—Co 


6.7. (a) If X has a gamma distribution with parameters 
(t,.), what is the distribution of cX,c > 0? 


(b) Show that j 


eee 
2y X2n 


has a gamma distribution with parameters n, i when n is a 
positive integer and Xn is a chi-squared random variable 
with 2n degrees of freedom. 


6.8. Let X and Y be independent continuous random vari- 
ables with respective hazard rate functions A(t) andAay (0), 
and set W = min(Xx, Y). 


(a) Determine the distribution function of W in terms of 
those of X and Y. 


(b) Show that Aw(t), the hazard rate function of W, is 
given by 


Aw(t) =Ax() + Ay(t) 
6.9. Show that 


n! 
A tx $2 te $x 


f(%1,X2, Sere »Xn) 
for x1 = 0,x2 = 0,....%» = O constitutes a joint density 
function. Compute P{X, = X2,X2 = X3} where X1, Xo, 
and_X3 follow the joint density function above. 


6.10. The lifetimes of batteries are independent exponen- 
tial random variables, each having parameter 2. A flash- 
light needs 2 batteries to work. If one has a flashlight and 
a stockpile of 1 batteries, what is the distribution of time 
that the flashlight can operate? 


6.11. Let X1, X2, X3, X4, X5 be independent continuous 
random variables having a common distribution function 
F and density function f, and set 


T=P{X, < Xo < X3 < X4 < Xs} 


(a) Show that J does not depend on F. 

Hint: Write I as a five-dimensional integral and make the 
change of variables uj = F(x;),i=1,...,5. 

(b) Evaluate J. 

(c) Give an intuitive explanation for your answer to (b). 


6.12. Show that the jointly continuous (discrete) random 
variables X1,...,X, are independent if and only if their 
joint probability density (mass) function f(x1,...,%n) can 
be written as 


fx, Se Xn) = | [si@ 
i=1 


for nonnegative functions gj(x),i=1,...,n. 


6.13. In Example 5e, we computed the conditional density 
of a success probability for a sequence of trials when the 
first n + m trials resulted in n successes. Would the condi- 
tional density change if we specified which n of these trials 
resulted in successes? 


6.14. X and Y are independent geometrically distributed 
random variables both with parameter p. 


(a) Work out P{X > Y}. Hence derive an expression for 
PLX = Y}. 
(b) Compute P{X = 2Y|X > Y}. 


6.15. Consider a sequence of independent trials, with each 
trial being a success with probability p. Given that the kth 
success occurs on trial n, show that all possible outcomes 
of the first n — 1 trials that consist of k — 1 successes and 
n — k failures are equally likely. 


6.16. The number of particles N arriving inside a closed 
chamber within a fixed interval of time is Poisson dis- 
tributed with parameter A. Once inside the chamber, n 
particles go through a process from which the number X 
of disintegrating particles follows a binomial distribution 
with parameters (n,p). Compute P{X = k} showing that 
X itself is Poisson. 


6.17. Suppose that X;,i = 1,2,3 are independent Pois- 
son random variables with respective means Aj;,i = 1,2,3. 
Let X = X, + X2 and Y = X2 + X3. The random 
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vector X,Y is said to have a bivariate Poisson distribu- 
tion. Find its joint probability mass function. That is, find 
P{X =n, Y =m}. 


6.18. X and Y are integer valued random variables. Prove 
that 


PX =iorY=j) | 1 i 1 
P(X =i,Y=j) P(X =iY=j) P(Y=j|X =i) 


for all integers i, . 


6.19. Given X1, X2, X3, X4 are four positive, independent, 
identically distributed random variables with a shared con- 
tinuous distribution, find the following probabilities 

(a) Piya, < 4th 

(b) P(X, < X2,X3 < X4|X2 < X3}; 

(c) P{X, > X2 > X3|X2 > X3}. 


6.20. The random variable X is exponentially distributed 
with parameter 4. Work out the following conditional dis- 
tributions: 


(a) P{X|X > 4}; 
(b) P{X|X < th. 


6.21. Suppose that W, the amount of moisture in the air 
on a given day, is a gamma random variable with parame- 
ters (t, 6). That is, its density is f(w) = Be~6" (Bw)! / T(t), 
w > 0. Suppose also that given that W = w, the num- 
ber of accidents during that day—call it N—has a Poisson 
distribution with mean w. Show that the conditional distri- 
bution of W given that N = n is the gamma distribution 
with parameters (¢ + n,B + 1). 


6.22. Let W be a gamma random variable with param- 
eters (t,6), and suppose that conditional on W = w, 
X,,X2,...,Xn are independent exponential random vari- 
ables with rate w. Show that the conditional distribution 
of W given that X, = x1,X2 = x2,...,Xn = Xn, is gamma 


n 
with parameters {¢ + n,B + do x; ). 
i=l 


6.23. A rectangular array of mn numbers arranged in 1 
rows, each consisting of 7 columns, is said to contain a sad- 
dlepoint if there is a number that is both the minimum of 
its row and the maximum of its column. For instance, in 
the array 


1 3 2 
0 -—2 6 
om 42°3 


the number | in the first row, first column is a saddlepoint. 
The existence of a saddlepoint is of significance in the the- 
ory of games. Consider a rectangular array of numbers as 
described previously and suppose that there are two indi- 
viduals—A and B—who are playing the following game: A 
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is to choose one of the numbers 1,2,...,2 and B one of the 
numbers 1,2,...,m. These choices are announced simulta- 
neously, and if A chose i and B chose j, then A wins from 
B the amount specified by the number in the ith row, jth 
column of the array. Now suppose that the array contains 
a saddlepoint—say the number in the row r and column 
k—call this number x,x. Now if player A chooses row r, 
then that player can guarantee herself a win of at least 
Xk (since x;% is the minimum number in the row r). On 
the other hand, if player B chooses column k, then he can 
guarantee that he will lose no more than x, (since x; is 
the maximum number in the column k). Hence, as A has 
a way of playing that guarantees her a win of x,, and as B 
has a way of playing that guarantees he will lose no more 
than x;z, it seems reasonable to take these two strategies 
as being optimal and declare that the value of the game to 
player A is x;,. 

If the nm numbers in the rectangular array described are 
independently chosen from an arbitrary continuous distri- 
bution, what is the probability that the resulting array will 
contain a saddlepoint? 


6.24. If X is exponential with rate A, find P{LX] =n, X — 
[X] = x}, where [x] is defined as the largest integer less 
than or equal to x. Can you conclude that [X] and X — [X] 
are independent? 


6.25. Suppose that F(x) is a cumulative distribution func- 
tion. Show that (a) F”(x) and (b) 1 — [1 — F()]” are 
also cumulative distribution functions when n is a positive 
integer. 

Hint: Let X1,...,Xn be independent random variables 
having the common distribution function F. Define ran- 
dom variables Y and Z in terms of the X; so that P{Y = 
x} = F"(x) and P{Z S x} =1 — [1 — F(x)]". 


6.26. Show that ifm people are distributed at random along 
a road L miles long, then the probability that no 2 people 
are less than a distance D miles apart is when D = L/(n — 
1), [1 — (2 — 1)D/L]". Whatif D > L/(n — 1)? 


6.27. Suppose that X;,...,X, are independent exponen- 
tial random variables with rate 4. Find 


(a) fx,|xX,4..+X, (410), the conditional density of X; given 
that X; +...4+ X,=6 
(b) P(X, < x|X1 +... + X=. 


6.28. Establish Equation (6.2) by differentiating Equa- 
tion (6.4). 


6.29. Show that the median of a sample of size 2n + 1 from 
a uniform distribution on (0, 1) has a beta distribution with 
parameters (n + 1, + 1). 


6.30. Suppose that X1,...,X, are independent and iden- 
tically distributed continuous random variables. For A = 
{(X1 <--) < Xj > Xiu. > +++ > Xn}, find P(A). That is, find 


the probability that the function X(7) = Xj, i= 1,...,n, is 
a unimodal function with maximal value X(j). Hint: Write 


A = {max(Xj,..., Xj) = max(Xj,...,Xn), 


X << Xj, Xp. > + > Xn} 


6.31. Compute the density of the range of a sample of size 
n from a continuous distribution having density function f. 


6.32. Let Xq) = XQ) = --- = Xn) be the ordered values 
of n independent uniform (0, 1) random variables. Prove 
thatforl =kKsn4+1, 


P{Xw a X(k-1) > t} = (1 = ty” 
where Xo) = 0,X(n41) = 1,and0 < ¢ < 1. 


6.33. Let X1,...,X, be a set of independent and identi- 
cally distributed continuous random variables having dis- 
tribution function F, and let X(j),i = 1,...,n denote their 
ordered values. If X, independent of the X;,i = 1,...,n, 
also has distribution F, determine 


(a) P(X > X~}s 
(b) P(X > Xay}s 
(c) P(X <X < Xjjl Si< J =n. 


6.34. Let X1,...,Xn, be independent and identically dis- 
tributed random variables having distribution function F 
and density f. The quantity M = [X(1) + X(n)]/2, defined 
to be the average of the smallest and largest values in 
X1,...,Xn, is called the midrange of the sequence. Show 
that its distribution function is 


m 


Fy(m) =n / [FQm — x) — F(x)}""!f(@) dx 


—Co 


6.35. Let X1,...,X;, be independent uniform (0, 1) ran- 
dom variables. Let R = X(n) — X (1) denote the range and 
M = [Xn + X(y]/2 the midrange of X1,..., Xn. Compute 
the joint density function of R and M. 


6.36. If X and Y are independent standard normal random 
variables, determine the joint density function of 


Then use your result to show that X/Y has a Cauchy 
distribution. 


6.37. Suppose that (X, Y) has a bivariate normal distribu- 
tion with parameters /1y, Ly, Ox, Oy, p- 


(a) Show that (A, HY) has a bivariate normal distri- 
bution with parameters 0, 1,0, 1, p. 
(b) What is the joint distribution of (aX + b,cY + d). 


6.38. Suppose that X has a beta distribution with parame- 
ters (a, b), and that the conditional distribution of N given 
that X = x is binomial with parameters (n + m,x). Show 
that the conditional density of X given that N = n is 
the density of a beta random variable with parameters 
(n + a,m + b). N is said to be a beta binomial random 
variable. 


6.39. Consider an experiment with n possible outcomes, 
having respective probabilities P;,...,Pn, paar Pe = Ve 
and suppose we want to assume a probability distribution 
on the probability vector (P1,..., Pn). Because an P= 
1, we cannot define a density on P},...,P,, but what we 
can do is to define one on Pj,...,P,—1; and then take 
Py = 1 — YY) Pj. The Dirichlet distribution takes 


(Pj,...,Pn—1) to be uniformly distributed over the set 
S = {(p1,---,Pn—-1): Eo Pb <1,pj>0,i=1,...,n— 1}. 
That is, the Dirichlet density is 
fp, geeey. Pa (DiseegPn-i) = CG, 
n-1 


pi > 0,i=1,...,n - 1,> 0 pi <1 
i=1 


(a) Determine C. Hint: Use results from Section 6.3.1. 


Self-Test Problems and Exercises 


6.1. Each throw of an unfair die lands on each of the odd 
numbers 1, 3,5 with probability C and on each of the even 
numbers with probability 2C. 


(a) Find C. 

(b) Suppose that the die is tossed. Let X equal 1 if the 
result is an even number, and let it be 0 otherwise. Also, 
let Y equal 1 if the result is a number greater than three 
and let it be 0 otherwise. Find the joint probability mass 
function of X and Y. Suppose now that 12 independent 
tosses of the die are made. 

(c) Find the probability that each of the six outcomes 
occurs exactly twice. 

(d) Find the probability that 4 of the outcomes are either 
one or two, 4 are either three or four, and 4 are either five 
or six. 

(e) Find the probability that at least 8 of the tosses land on 
even numbers. 


6.2. The joint probability mass function of the random 
variables XY, Y, Z is 
1 
pci, 2,3) = p(2, ds 1) = p(2,2, 1) = p(2,3,2) = 4 
Find (a) E[XYZ], and (b) E[LXY + XZ + YZ]. 
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Lets. 
ables. 
(b) Show that the Dirichlet density is the conditional den- 
sity of Uj,..., U,_1 given that pean U; < 1. 

*(c) Show that Uy, Ue) -_ Uay,- ens Ua) _ U(n-1) has a 
Dirichlet distribution, where U(1),...,U() are the order 
statistics of Uj,..., Uy. 


6.40. Let Fy,...x,(%1,---.Xn) and fy, x, (%1,---,Xn) be, 
respectively, the joint distribution function and the joint 
density function of X1,..., Xn. 

Show that 


., U,, be independent uniform (0, 1) random vari- 


on, as X,(%15--..Xn) = fx, sisés Xo (Migs 25 Xn): 


6.41. For given constants c; > 0, let Yj; = cGXj,i = 
1,...,n, and let Fy, y, 1,-..,Xn) and fy, ...y,,(41,---,Xn) 
be, respectively the joint distribution function and the joint 
density function of Yj,..., Yn. 

(a) Express Fy, y,,(¥1,-.-,n) in terms of the joint distri- 
bution function of X),...,Xp. 

(b) Express fy, |. .y,,(%1,...,%n) in terms of the joint density 
function of X1,...,Xn. 

(c) Use Equation (73) to verify your answer to part (b). 


6.3. The joint density of X and Y is given by 


f@y=CyY —xe* -y<x<y, 0O<y<o 


(a) Find C. 

(b) Find the density function of X. 
(c) Find the density function of Y. 
(d) Find EX]. 

(e) Find E[Y]. 


6.4. Letr =r, +... + rx, where all 7; are positive integers. 
Argue that if X7,...,X; has a multinomial distribution, 
then so does Yj,..., Y; where, with ro = 0, 


That is, Y; is the sum of the first r; of the X’s, Y> is the 
sum of the next rp, and so on. 
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6.5. Suppose that X, Y, and Z are independent random 
variables that are each equally likely to be either 1 or 
2. Find the probability mass function of (a) XYZ, (b) 
XY + XZ 4+ YZ, and (c) X*? + YZ, 


6.6. Let X and Y be continuous random variables with 
joint density function 


x 
=~+cy OK<x<11l<y<5 
fay) = 45 
0 otherwise 


where c is a constant. 


(a) What is the value of c? 
(b) Are X and Y independent? 
(c) Find P{X¥ + Y > 3}. 


6.7. The joint density function of X and Y is 


O<x<10<y<2 
otherwise 


fy) = io 


(a) Are X and Y independent? 

(b) Find the density function of X. 

(c) Find the density function of Y. 

(d) Find the joint distribution function. 
(e) Find E[Y]. 

(f) Find P{X¥ + Y < 1}. 


6.8. Consider two components and three types of shocks. 
A type 1 shock causes component 1 to fail, a type 2 shock 
causes component 2 to fail, and a type 3 shock causes both 
components | and 2 to fail. The times until shocks 1, 2, 
and 3 occur are independent exponential random variables 
with respective rates A1,A2, and A3. Let X; denote the time 
at which component / fails, i = 1,2. The random variables 
X1, X2 are said to have a joint bivariate exponential distri- 
bution. Find P{X; > s,X2 > ft}. 


6.9. Consider a directory of classified advertisements that 
consists of m pages, where m is very large. Suppose that 
the number of advertisements per page varies and that 
your only method of finding out how many advertisements 
there are on a specified page is to count them. In addition, 
suppose that there are too many pages for it to be feasible 
to make a complete count of the total number of adver- 
tisements and that your objective is to choose a directory 
advertisement in such a way that each of them has an equal 
chance of being selected. 


(a) If you randomly choose a page and then randomly 
choose an advertisement from that page, would that satisfy 
your objective? Why or why not? 

Let n(i) denote the number of advertisements on page 
i,i = 1,...,m, and suppose that whereas these quantities 


are unknown, we can assume that they are all less than 
or equal to some specified value n. Consider the following 
algorithm for choosing an advertisement. 


Step 1. Choose a page at random. Suppose it is page X. 
Determine n(X) by counting the number of adver- 
tisements on page X. 

“Accept” page X with probability n(X)/n. If page 
X is accepted, go to step 3. Otherwise, return to 
step 1. 

Randomly choose one of the advertisements on 
page X. 


Step 2. 


Step 3. 


Call each pass of the algorithm through step 1 an iter- 
ation. For instance, if the first randomly chosen page 
is rejected and the second accepted, then we would 
have needed 2 iterations of the algorithm to obtain an 
advertisement. 


(b) What is the probability that a single iteration of the 
algorithm results in the acceptance of an advertisement on 
page i? 

(c) What is the probability that a single iteration of the 
algorithm results in the acceptance of an advertisement? 
(d) What is the probability that the algorithm goes through 
k iterations, accepting the jth advertisement on page i on 
the final iteration? 

(e) What is the probability that the jth advertisement on 
page 7 is the advertisement obtained from the algorithm? 
(f) What is the expected number of iterations taken by the 
algorithm? 


6.10. The “random” parts of the algorithm in Self-Test 
Problem 6.9 can be written in terms of the generated val- 
ues of a sequence of independent uniform (0, 1) random 
variables, known as random numbers. With [x] defined as 
the largest integer less than or equal to x, the first step can 
be written as follows: 


Step 1. Generate a uniform (0, 1) random variable U. Let 
X = [mU] + 1, and determine the value of n(X). 


(a) Explain why the above is equivalent to step 1 of Prob- 
lem 6.8. 


Hint: What is the probability mass function of X? 
(b) Write the remaining steps of the algorithm in a similar 
style. 
6.11. Let X1, X2,... be asequence of independent uniform 
(0, 1) random variables. For a fixed constant c, define the 
random variable N by 

N=min{n: X;, > c} 
Is N independent of Xy? That is, does knowing the 
value of the first random variable that is greater than c 
affect the probability distribution of when this random 
variable occurs? Give an intuitive explanation for your 
answer. 


6.12. The accompanying dartboard is a square whose sides 
are of length 6: 


The three circles are all centered at the center of the board 
and are of radii 1, 2, and 3, respectively. Darts landing 
within the circle of radius 1 score 30 points, those land- 
ing outside this circle, but within the circle of radius 2, 
are worth 20 points, and those landing outside the circle 
of radius 2, but within the circle of radius 3, are worth 10 
points. Darts that do not land within the circle of radius 3 
do not score any points. Assuming that each dart that you 
throw will, independently of what occurred on your pre- 
vious throws, land on a point uniformly distributed in the 
square, find the probabilities of the accompanying events: 


(a) You score 20 on a throw of the dart. 

(b) You score at least 20 on a throw of the dart. 

(c) You score 0 on a throw of the dart. 

(d) The expected value of your score on a throw of 
the dart. 

(e) Both of your first two throws score at least 10. 

(f) Your total score after two throws is 30. 


6.13. A model proposed for NBA basketball supposes that 
when two teams with roughly the same record play each 
other, the number of points scored in a quarter by the 
home team minus the number scored by the visiting team 
is approximately a normal random variable with mean 1.5 
and variance 6. In addition, the model supposes that the 
point differentials for the four quarters are independent. 
Assume that this model is correct. 


(a) What is the probability that the home team wins? 

(b) What is the conditional probability that the home team 
wins, given that it is behind by 5 points at halftime? 

(c) What is the conditional probability that the home team 
wins, given that it is ahead by 5 points at the end of the first 
quarter? 


6.14. Let N be a geometric random variable with parame- 
ter p. Suppose that the conditional distribution of X given 
that N = nis the gamma distribution with parameters n 
and i. Find the conditional probability mass function of N 
given that X = x. 


6.15. Let X and Y be independent uniform (0, 1) random 
variables. 


(a) Find the joint density of U= X¥,V=X + Y. 
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(b) Use the result obtained in part (a) to compute the den- 
sity function of V. 


6.16. You and three other people are to place bids for an 
object, with the high bid winning. If you win, you plan to 
sell the object immediately for $10,000. How much should 
you bid to maximize your expected profit if you believe 
that the bids of the others can be regarded as being inde- 
pendent and uniformly distributed between $7000 and 
$10,000 thousand dollars? 


6.17. Find the probability that X1, X2,.. 
tation of 1,2,...,n, when X1,X5,.. 
and 


., Xn iS a permu- 
., Xn are independent 


(a) each is equally likely to be any of the values 1,..., 1; 
(b) each has the probability mass function P{X; = j} = 
Py, J =1,..., 1. 


6.18. Let X1,..., Xn and Yj,..., Yn be independent ran- 
dom vectors, with each vector being a random ordering of 
k ones andn — k zeros. That is, their joint probability mass 
functions are 


P{X=h,...,Xn=in}=P{Y1=h1,..-,Yn= in} 


1 n 
= 91, 2a * 
(i) 


i 
N=) 1% — Yi 
i=1 


Let 


denote the number of coordinates at which the two vec- 
tors have different values. Also, let M denote the number 
of values of i for which X; = 1, Y; = 0. 


(a) Relate N to M. 

(b) What is the distribution of M? 
(c) Find E[N]. 

(d) Find Var(N). 


"6.19. Let Z1,Z2,...,Zy be independent standard normal 
random variables, and let 


(a) What is the conditional distribution of S, given that 
S, = y. Find it fork =1,...,n — 1. 

(b) Show that, for 1 = k = n, the conditional distribution 
of S; given that S, = x is normal with mean xk/n and 
variance k(n — k)/n. 
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6.20. Let X1,X2,... be a sequence of independent and 
identically distributed continuous random variables. Find 


(a) P{X6 > X1|X1 = max(Xj,...,X5)} 

(b) P{X6 > X2|Xq = max(X],...,X5)} 

6.21. Prove the identity 

P(X Ss, YSh=P{X Ss} +P(YSH+ P(X >s,Y>H-1 


Hint: Derive an expression for P{X > s,Y > ¢} by taking 
the probability of the complementary event. 


6.22. In Example Ic, find P(X; = i, Y; = j) when] < i. 


6.23. A Pareto random variable X with parameters a > 0, 
dX > Ohas distribution function F(x) = 1 — a*x~*,x > a. 
For x9 > a, verify that the conditional distribution of X 
given that X > x9 is that of a Pareto random variable with 


parameters (xo, A) by evaluating PLY > x|X > x9). 
6.24. Verify the identity fy(x) = [°° fxiy aly) fy )dy. 


6.25. In a contest originating with n players, each player 
independently advances to the next round, with player 7 
advancing with probability p;. If no players advance to the 
next round, then the contest ends and all the players in 
the just concluded round are declared co-winners. If only 
one player advances, then that player is declared the win- 
ner and the contest ends. If two or more players advance, 


then those players play another round. Let Xj; denote the 
number of rounds that i plays. 


(a) Find P(X; = k). Hint: Note that {X; = k} will occur 

if i advances at least k times and at least one of the other 
players advances at least k — 1 times. 

(b) Find P(Z is either the sole winner or one of the co-winners). 
Hint: It might help to imagine that a player always contin- 

ues to play rounds until he or she fails to advance. (That 

is, if there is a sole winner then imagine that that player 
continues on until a failure occurs.) 


(c) Find P(i is the sole winner) 


6.26. Let X1,..., Xn be independent nonnegative integer 
valued random variables, and let aj = P(X; is even),i = 
1,...,n. With S = >°i_, X; we want to determine p = 
P(S is even). Let Y; = 1 if X; is even and let it equal —1 if 
Xj; is odd. 

In parts (a) and (b) fill in the missing word at the end of 
the sentence. 


(a) S is even if and only if the number of X1,..., X;, that 


are odd is 

(b) S is even if and only if []/_, Yj is 

(©) Find ETL; Yi. 

(d) Find P(S is even). Hint: Use parts (b) and (c). 
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7.1. Introduction 


In this chapter, we develop and exploit additional properties of expected values. To 
begin, recall that the expected value of the random variable X is defined by 


E[X] = > x p(x) 
x 
when X is a discrete random variable with probability mass function p(x), and by 


E[X] = a x f(x) dx 


when _X is a continuous random variable with probability density function f(x). 
Since E/X] is a weighted average of the possible values of X, it follows that if X 
must lie between a and b, then so must its expected value. That is, if 
PlasxX sby=1 


then 
a= E[X] =b 
To verify the preceding statement, suppose that X is a discrete random variable for 


which P{a = X = b} = 1. Since this implies that p(x) = 0 for all x outside of the 
interval [a, b], it follows that 


E[X]= )° xp@) 


xip(x)>0 


= Do ape 


xip(x)>0 
315 
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=a > 7@ 


xip(x)>0 


=a 


In the same manner, it can be shown that ELX] = b, so the result follows for discrete 
random variables. As the proof in the continuous case is similar, the result follows. 


7.2 Expectation of Sums of Random Variables 


Proposition 
2.1 


Example 
2a 


For a two-dimensional analog of Propositions 4.1 of Chapter 4 and 2.1 of Chapter 5, 
which give the computational formulas for the expected value of a function of a 
random variable, suppose that X and Y are random variables and g is a function of 
two variables. Then we have the following result. 


If X and Y have a joint probability mass function p(x,y), then 


E[g(X, Y)] = S2>5 gy) py) 
y x 
If X and Y have a joint probability density function f(x,y), then 


E[g(X, Y)] = / g(x,y) f(x, y) dx dy 


Let us give a proof of Proposition 2.1 when the random variables X and Y are 
jointly continuous with joint density function f(x, y) and when g(X, Y) is a nonneg- 
ative random variable. Because g(X, Y) = 0, we have, by Lemma 2.1 of Chapter 5, 
that 


Elg(X, Y)] = Piex.y) > thdt 
0) 
Writing 
P{g(X,Y) > f= II f(x, y) dy dx 
(xy)igQ.y)>t 


shows that 


E[g(X, Y)] = eo: f(x, y) dy dx dt 
0 (x,y):ig(x,y) >t 


Interchanging the order of integration gives 
gry) 
eeayy=f ff f(x,y) dt dy dx 
xdJydt= 


=| [aenfo.y dy dx 
x Jy 


Thus, the result is proven when g(X, Y) is a nonnegative random variable. The gen- 
eral case then follows as in the one-dimensional case. (See Theoretical Exercises 2 
and 3 of Chapter 5.) 


An accident occurs at a point X that is uniformly distributed on a road of length L. 
At the time of the accident, an ambulance is at a location Y that is also uniformly 
distributed on the road. Assuming that X and Y are independent, find the expected 
distance between the ambulance and the point of the accident. 


Example 
2b 
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Solution We need to compute E[|X — Y|]. Since the joint density function of X and 
Y is 1 
FRY) = 7a: O<x<L, O0O<y<L 


it follows from Proposition 2.1 that 


1 L Ls 
AIX - Y= 7 f [ x — yldydx 


Now, 
EL x L 
[w= var= [oo - nay + [oo - nay 
0) 0 x 
x2 L? x2 
= L 
gh oy ee 
L2 
Sa 4 gy = eh 
Therefore, 


For an important application of Proposition 2.1, suppose that ELX] and E[Y] are 
both finite and let g(X, Y) = X + Y. Then, in the continuous case, 


E[X + m=] / (x + y) fy) dx dy 


-|/ if xfirydyds + f / y f(x, y) dx dy 


lo) (oe) 
=f stewmar+ [ yfrondy 
—0o —oo 
= E[X] + E[Y] 
The same result holds in general; thus, whenever ELX] and E[Y] are finite, 


E[X + Y] =E[X] + FLY] (2.1) 


Suppose that for random variables X and Y, 
X=Y 


That is, for any outcome of the probability experiment, the value of the random 
variable X is greater than or equal to the value of the random variable Y. Since 
X = Y is equivalent to the inequality ¥ — Y = 0, it follows that ELY — Y] = 0, 
or, equivalently, 

E[X] = E[Y] O 


Using Equation (2.1), we may show by a simple induction proof that if ELX;] is 
finite for alli =1,...,n, then 


E[X, +--+: + Xn] = ELM] +--+) + ELXa] (2.2) 
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Example 
2c 


Example 
2d 


Equation (2.2) is an extremely useful formula whose utility will now be illustrated 
by a series of examples. 


The sample mean 


Let X1,...,Xn be independent and identically distributed random variables having 
distribution function F and expected value jz. Such a sequence of random variables 
is said to constitute a sample from the distribution F. The quantity 


Lae 


Wn 
i=1 


X= 


is called the sample mean. Compute E[X]. 
Solution 


n 


E[X]=E| >> 


i=l 
1 n 
=-E 2m 
= 


1S AX 
i=1 


=m since ELX;] = u 


Xj 
n 


That is, the expected value of the sample mean is yw, the mean of the distribution. 
When the distribution mean jz is unknown, the sample mean is often used in statistics 
to estimate it. O 


Boole’s inequality 


Let Aj,...,A, denote events, and define the indicator variables Xj,i = 1,...,n, by 


Ye 1 if A; occurs 
'“ 10 otherwise 


Let : 
x=), 
i=1 


so X denotes the number of the events A; that occur. Finally, let 


|. #eei 
~ ]0 — otherwise 


so Y is equal to 1 if at least one of the A; occurs and is 0 otherwise. Now, it is imme- 
diate that 
X2=Y 


so 


Example 
2e 


Example 
2f 
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But since 


E[X] = > E[Xi] = )> PtAd 
i=1 i=1 


and 


n 
E[Y] = P{at least one of the Aj occur} = P Ai 
i=1 


we obtain Boole’s inequality, namely, 
n n 
P Ai = S P(Aj) | 
i=1 i=1 


The next three examples show how Equation (2.2) can be used to calculate the 
expected value of binomial, negative binomial, and hypergeometric random vari- 
ables. These derivations should be compared with those presented in Chapter 4. 


Expectation of a binomial random variable 


Let X be a binomial random variable with parameters n and p. Recalling that such 
a random variable represents the number of successes in n independent trials when 
each trial has probability p of being a success, we have that 


Nag SG ae ie ee 


where 
a 1 if the ith trial is a success 
‘“ )0 — if the ith trial is a failure 
Hence, X; is a Bernoulli random variable having expectation E[Xj] = 1(p) + 
O(1 — p). Thus, 
E[X] = E[X1] + E[X2] + --- + E[Xn] = np a 


Mean of a negative binomial random variable 


If independent trials having a constant probability p of being successes are per- 
formed, determine the expected number of trials required to amass a total of r 
successes. 


Solution If X denotes the number of trials needed to amass a total of r successes, 
then X is a negative binomial random variable that can be represented by 


X=X4+X24+-:-+ X 


where X{ is the number of trials required to obtain the first success, X2 the number 
of additional trials until the second success is obtained, X3 the number of additional 
trials until the third success is obtained, and so on. That is, X; represents the number 
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Example 
2g 


of additional trials required after the (i — 1) success until a total of i successes is 
amassed. A little thought reveals that each of the random variables Xj; is a geomet- 
ric random variable with parameter p. Hence, from the results of Example 8b of 
Chapter 4, E[Xj] = 1/p,i=1,2,...,r; thus, 
Ps 
BP) AG) EL 7. 


Mean of a hypergeometric random variable 


If n balls are randomly selected from an urn containing N balls of which m are white, 
find the expected number of white balls selected. 


Solution Let X denote the number of white balls selected, and represent X as 


X= XxX, spot + Xm 


where 
ee 1 if the ith white ball is selected 
‘10 otherwise 
Now 
E[Xi] = P{Xi = 1} 
= P{ith white ball is selected} 
1 N-1 
1 n—-1 
N 
n 
an 
~ N 
Hence, 
mn 


We could also have obtained the preceding result by using the alternative represen- 
tation 


AS Ya eh Yy 


where 


1 if the ith ball selected is white 


Y= 0 otherwise 


Since the ith ball selected is equally likely to be any of the N balls, it follows that 


BLY) == 
sO 
=A) fon bd eS : 


Example 
2h 


Example 
2i 
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Expected number of matches 


Suppose that N people throw their hats into the center of a room. The hats are mixed 
up, and each person randomly selects one. Find the expected number of people who 
select their own hat. 


Solution Letting X denote the number of matches, we can compute ELX] most eas- 
ily by writing 


X=X, +X +--+ Xy 
where 


1 if the ith person selects his own hat 


ee — : 
: 0 otherwise 


Since, for each i, the ith person is equally likely to select any of the N hats, 


FLX!) = PX = 1) = 5 


Thus, 
1 
E[X] = EL[Xi] + --- + E[Xy] = (y)N=1 
Hence, on the average, exactly one person selects his own hat. 2 


Coupon-collecting problems 


Suppose that there are N types of coupons, and each time one obtains a coupon, it 
is equally likely to be any one of the N types. Find the expected number of coupons 
one needs to amass before obtaining a complete set of at least one of each type. 


Solution Let X denote the number of coupons collected before a complete set is 
attained. We compute E[X] by using the same technique we used in computing the 
mean of a negative binomial random variable (Example 2f). That is, we define Xj, i = 
0,1,...,N — 1 to be the number of additional coupons that need be obtained after 
i distinct types have been collected in order to obtain another distinct type, and we 
note that 


A =X +X + oes + XN 


When i distinct types of coupons have already been collected, a new coupon obtained 
will be of a distinct type with probability (NV — i)/N. Therefore, 


N-i * k-1 
Pee ha (=) k=1 


or, in other words, X; is a geometric random variable with parameter (N — 1)/N. 
Hence, 
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Example 
2j 


Example 
2k 


implying that 


Ten hunters are waiting for ducks to fly by. When a flock of ducks flies overhead, the 
hunters fire at the same time, but each chooses his target at random, independently 
of the others. If each hunter independently hits his target with probability p, com- 
pute the expected number of ducks that escape unhurt when a flock of size 10 flies 
overhead. 


Solution Let X; equal 1 if the ith duck escapes unhurt and 0 otherwise, for i = 1, 
2,...,10. The expected number of ducks to escape can be expressed as 


E[X1 + +++ + X19] = ELM] + --- + ELX10] 


To compute E[X;] = P{X; = 1}, we note that each of the hunters will, independently, 
hit the ith duck with probability p/10, so 


10 
: P 
P{X; = 1} = (1 _ F) 


Hence, 


Expected number of runs 


Suppose that a sequence of n 1’s and m 0’s is randomly permuted so that each of the 
(n + m)!/(!m!) possible arrangements is equally likely. Any consecutive string of 
1’s is said to constitute a run of 1’s—for instance, if n = 6,m = 4, and the ordering 
is 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, then there are 3 runs of 1’s—and we are interested in 
computing the mean number of such runs. To compute this quantity, let 


1 ifarun of 1’s starts at the ith position 


a 0 otherwise 


Therefore, R(1), the number of runs of 1, can be expressed as 


n+m 


RQ) = > iF 
i=1 


and it follows that 


n+m 


EIR] = SO Eli] 
i=1 
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Now, 


E|L,] = P{‘‘1” in position 1} 
n 


n+m 


andforl <isn+™, 


E|Zi] = P{“0” in position i — 1, “1” in position i} 
m n 


n+>mn+m-—1 


Hence, 
nm 
E[R()] = 1 
cc) ea ae Gem n= 
Similarly, E[R(0)], the expected number of runs of 0’s, is 
m nm 
E[R(O)] = 
RO] n+m n+m 
and the expected number of runs of either type is 
2nm 
E[R() + RO)J=1 4+ O 
n+m 


Example A random walk in the plane 


21 ; beat de age , ae F 
Consider a particle initially located at a given point in the plane, and suppose that it 


undergoes a sequence of steps of fixed length, but in a completely random direction. 
Specifically, suppose that the new position after each step is one unit of distance from 
the previous position and at an angle of orientation from the previous position that 
is uniformly distributed over (0, 277). (See Figure 71.) Compute the expected square 
of the distance from the origin after 1 steps. 


©) = initial position 
@ = position after first step 


@ = position after second step 


Figure 7.1 
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Solution Letting (X;, Y;) denote the change in position at the ith step, i = 1,...,n, 
in rectangular coordinates, we have 


Xj = COs 6; 
Y; = sin 6; 
where 6;,i = 1,...,n, are, by assumption, independent uniform (0, 277) random vari- 


n n 
ables. Because the position after n steps has rectangular coordinates | )> Xi, )> Yi J, 
i=1 i=1 


it follows that D?, the square of the distance from the origin, is given by 


2 2 
n n 
D=|>°X:] + (> 0%: 
i=1 j=1 
n 
=X? + YP) + ODO; + VY) 
i=l iAj 
=n+ Ss Y (cos 6; cos 6; + sin 6; sin 6;) 
iAj 


where cos? 6; + sin’ 6; = 1. Taking expectations and using the independence of 6; 
and 6; when i # j and the fact that 


2a 
2x E[cos 6;] = / cosudu =sin2z — sin0=0 
0 
20 
2x E[sin 6;| = / sinudu = cos0 — cos27 = 0 
0 


we arrive at 
E[D?] =n a 


Analyzing the quick-sort algorithm 


Suppose that we are presented with a set of n distinct values x;,x2,...,X, and that 
we desire to put them in increasing order, or as it is commonly stated, to sort them. 
An efficient procedure for accomplishing this task is the quick-sort algorithm, which 
is defined as follows: When n = 2, the algorithm compares the two values and then 
puts them in the appropriate order. When n > 2, one of the elements is randomly 
chosen—say it is x;— and then all of the other values are compared with x;. Those 
smaller than x; are put in a bracket to the left of x; and those larger than x; are put in 
a bracket to the right of x;. The algorithm then repeats itself on these brackets and 
continues until all values have been sorted. For instance, suppose that we desire to 
sort the following 10 distinct values: 


5,9,3, 10,11, 14,8,4,176 


We start by choosing one of them at random (that is, each value has probability a of 
being chosen). Suppose, for instance, that the value 10 is chosen. We then compare 
each of the others to this value, putting in a bracket to the left of 10 all those values 
smaller than 10 and to the right all those larger. This gives 


A First Course in Probability 325 


{5, 9, 3, 8, 4, 6}, 10, {11, 14, 17} 


We now focus on a bracketed set that contains more than a single value —say the one 
on the left of the preceding—and randomly choose one of its values—say that 6 is 
chosen. Comparing each of the values in the bracket with 6 and putting the smaller 
ones in a new bracket to the left of 6 and the larger ones in a bracket to the right 
of 6 gives 

{5, 3,4}, 6, {9, 8}, 10, {11, 14, 17} 


If we now consider the leftmost bracket, and randomly choose the value 4 for com- 
parison, then the next iteration yields 


{3}, 4, {5}, 6, {9, 8}, 10, {11, 14, 17} 


This continues until there is no bracketed set that contains more than a single value. 

If we let X denote the number of comparisons that it takes the quick-sort algo- 
rithm to sort n distinct numbers, then E[X] is a measure of the effectiveness of this 
algorithm. To compute E[LX], we will first express X as a sum of other random vari- 
ables as follows. To begin, give the following names to the values that are to be 
sorted: Let 1 stand for the smallest, let 2 stand for the next smallest, and so on. Then, 
forl =i < j <a, let J(i,j) equal 1 if i andj are ever directly compared, and let it 
equal 0 otherwise. With this definition, it follows that 


n—-1 on 
A=) IGD 
i=1 jit] 
implying that 


n—-l on 
E[X]=E|>° >) IG) 
i=1 j=i+1 
n—-l on 
=>) > él 
i=1 j=i+1 
n—-l on 
= > x P{i and j are ever compared} 
i=1 j=i+1 


To determine the probability that i and j are ever compared, note that the values 
ii + 1,...,7 — 1,j will initially be in the same bracket (since all values are initially 
in the same bracket) and will remain in the same bracket if the number chosen for 
the first comparison is not between i and j. For instance, if the comparison number is 
larger than j, then all the values i,i + 1,...,j7 — 1,j will go in a bracket to the left of 
the comparison number, and if it is smaller than i, then they will all go in a bracket 
to the right. Thus all the values i,i + 1,...,7 — 1,j will remain in the same bracket 
until the first time that one of them is chosen as a comparison value. At that point all 
the other values between i and j will be compared with this comparison value. Now, 
if this comparison value is neither i nor j, then upon comparison with it, i will go into 
a left bracket and j into a right bracket, and thus 7 and j will be in different brackets 
and so will never be compared. On the other hand, if the comparison value of the 
seti,i + 1,...,j — 1,jis either 7 or j, then there will be a direct comparison between 
iand j. Now, given that the comparison value is one of the values between i and j, 
it follows that it is equally likely to be any of these j — i + 1 values, and thus the 
probability that it is either 7 or j is 2/(j — i + 1). Therefore, we can conclude that 
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2 
P{i and j are ever compared} = - 


pote 
and 


n—-1 n 2 
AM= 4 
i=1 j=i+1 


To obtain a rough approximation of the magnitude of ELX] when v is large, we can 
approximate the sums by integrals. Now 


n 


a ae 
qd ~ Pt ie ke S EE 


. n 
=2log(x —i+ Dlg 
= 2log(n — i + 1) — 2log(2) 


= 2log(in — 1 + 1) 
Thus 


n—-1 


E[X] © } > 2log(n — i + 1) 
i=1 


n-1 
~2 | login — x + 1)dx 
1 


=2 i: log(y)dy 
= 2ylog(y) — yl 
= 2nlog(n) 


Thus we see that when n is large, the quick-sort algorithm requires, on average, 
approximately 2” log(n) comparisons to sort n distinct values. 


8 
Example The probability of a union of events 
2n 


Let A1,... Ay denote events, and define the indicator variables X;,i = 1,...,n, by 


Y= 1 if A; occurs 
‘10 otherwise 
Now, note that 


1 if U A; occurs 
i I]a =a) f otherwise 
Hence, 


E 1-J]Ja- x =p UJai 


i=1 
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Expanding the left side of the preceding formula yields 


n n 
P Ua} =E)S°X — OX X) + DO XX Xe 
i=1 i=1 


i<j i<j<k 


Suse fe (aT) i oe (2.3) 
However, 
1 if Aj, Aj, --- Ai, Occurs 
Xj, Xi, : Xix = a 
0 otherwise 
so 


ELX;, «++ Xiz] = P(A ++ Ag) 


Thus, Equation (2.3) is just a statement of the well-known inclusion-exclusion for- 
mula for the union of events: 


PUA) = >) P(A) — D> P(AIA) + 95>) DO PAiAjAd) 
i i<j i<j<k 


2 ee (=1)"* | PAY +-A,) | 


When one is dealing with an infinite collection of random variables Xj,i = 1, 
each having a finite expectation, it is not necessarily true that 


E WX = 3 E[Xi] (2.4) 
i=1 i=1 


Co n 
To determine when (2.4) is valid, we note that }* Xj = lim > Xj. Thus, 
i=1 nN? OO 54 


Co n 
| ai] =e} mn Sox 
l= 1= 


n 
ae 
= Pa ’ ” Xi 
i=l 
n 
= in, A 
i=l 
[oe] 
=) FX] (2.5) 


Hence, Equation (2.4) is valid whenever we are justified in interchanging the expec- 
tation and limit operations in Equation (2.5). Although, in general, this interchange 
is not justified, it can be shown to be valid in two important special cases: 


1. The X; are all nonnegative random variables. (That is, PLX; = 0} = 1 for all i.) 
lo) 
2: 3 E[|Xi|] < 0. 


i=i 
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Consider any nonnegative, integer-valued random variable X. If, for each i = 1, we 
define 


1 fX=Zi 
AP= NG, SEX ej 
then 
oo X oo 
VX=>DX+ YO X; 
i=1 i=1 i=X+4+1 
XxX oo 
=)'1+ )) 0 
i=1 i=X+1 


Hence, since the Xj are all nonnegative, we obtain 


E[X] = ¥ E(Xi) 
i=1 


CO 
= > PX > i} (2.6) 

i=1 
a useful identity. O 
Suppose that 1 elements—call them 1, 2, ...,»—must be stored in a computer in the 


form of an ordered list. Each unit of time, a request will be made for one of these 
elements —i being requested, independently of the past, with probability P(j), i = 1, 
> P(@) = 1. Assuming that these probabilities are known, what ordering minimizes 


L 
the average position in the line of the element requested? 


Solution Suppose that the elements are numbered so that P(1) = P(2) =--- = P(n). 
To show that 1, 2, ..., m is the optimal ordering, let X denote the position of the 
requested element. Now, under any ordering—say, O = ij, i,..., in, 


PolX = kh = >> PG) 
J=k 


=> P” 
J=k 
= P12... .n{X = k} 


aeeed 


Summing over k and using Equation (2.6) yields 
Eo[X] = F42,...,nLX] 


thus showing that ordering the elements in decreasing order of the probability that 
they are requested minimizes the expected position of the element requested. O 


Example 
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7.2.1. Obtaining Bounds from Expectations via the Probabilistic 
Method 


The probabilistic method is a technique for analyzing the properties of the elements 
of a set by introducing probabilities on the set and then studying an element chosen 
according to those probabilities. The technique was previously seen in Example 41 of 
Chapter 3, where it was used to show that a set contained an element that satisfied a 
certain property. In this subsection, we show how it can sometimes be used to bound 
complicated functions. 

Let f be a function on the elements of a finite set A, and suppose that we are 
interested in 


m= max f(s) 


A useful lower bound for m can often be obtained by letting S be a random element 
of A for which the expected value of f(S) is computable and then noting that m = 
f(S) implies that 

m = Elf(S)| 


with strict inequality if f(S) is not a constant random variable. That is, E[f(S)] is a 
lower bound on the maximum value. 


The maximum number of Hamiltonian paths in a tournament 
A round-robin tournament of n > 2 contestants is a tournament in which each of 


the : pairs of contestants play each other exactly once. Suppose that the players 


are numbered 1,2,3,...,”. The permutation ij, i2,...i, is said to be a Hamiltonian 
path if i, beats iz, iz beats i3,..., and i,_; beats i,. A problem of some interest is to 
determine the largest possible number of Hamiltonian paths. 

As an illustration, suppose that there are 3 players. On the one hand, one of 
them wins twice, then there is a single Hamiltonian path. (For instance, if 1 wins 
twice and 2 beats 3, then the only Hamiltonian path is 1, 2, 3.) On the other hand, if 
each of the players wins once, then there are 3 Hamiltonian paths. (For instance, if 1 
beats 2, 2 beats 3, and 3 beats 1, then 1, 2, 3; 2,3, 1; and 3, 1, 2, are all Hamiltonians.) 
Hence, when n = 3, there is a maximum of 3 Hamiltonian paths. 

We now show that there is an outcome of the tournament that results in more 


than n! /2”—! Hamiltonian paths. To begin, let the outcome of the tournament specify 
n 


the result of each of the (;) games played, and let A denote the set of all i(° pos- 


sible tournament outcomes. Then, with f(s) defined as the number of Hamiltonian 
paths that result when the outcome is s € A, we are asked to show that 


n! 
max f(s) = ant 
To show this, consider the randomly chosen outcome S that is obtained when the 
results of the ‘ games are independent, with each contestant being equally likely 


to win each encounter. To determine E[f(S)], the expected number of Hamiltonian 
paths that result from the outcome S$, number the n! permutations, and, for i = 
1,...,n!, let 
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Y= 1, if permutation iis a Hamiltonian 
‘10, otherwise 


Since 


f=) % 


it follows that 


Elf(S)] =) E[Xi] 


Because, by the assumed independence of the outcomes of the games, the probability 
that any specified permutation is a Hamiltonian is (1/2)"~', it follows that 


BX )\=PXG=1;=027"— 


Therefore, 


E[f(S)] = n!(/2)""" 


Since f(S) is not a constant random variable, the preceding equation implies that 
there is an outcome of the tournament having more than n!/2’~! Hamiltonian 
paths. i 


A grove of 52 trees is arranged in a circular fashion. If 15 chipmunks live in these 
trees, show that there is a group of 7 consecutive trees that together house at least 3 
chipmunks. 


Solution Let the neighborhood of a tree consist of that tree along with the next six 
trees visited by moving in the clockwise direction. We want to show that for any 
choice of living accommodations of the 15 chipmunks, there is a tree that has at least 
3 chipmunks living in its neighborhood. To show this, choose a tree at random and 
let X denote the number of chipmunks that live in its neighborhood. To determine 
E[X], arbitrarily number the 15 chipmunks and fori = 1,...,15, let 


1, if chipmunk 7 lives in the neighborhood of the randomly chosen tree 


oe 0, otherwise 


Because 
15 
X= se Xx; 
i=1 
we obtain that 


15 
E[X] = 0 EX] 
i=1 


However, because X; will equal 1 if the randomly chosen tree is any of the 7 trees 
consisting of the tree in which chipmunk i lives along with its 6 neighboring trees 
when moving in the counterclockwise direction, 


7 
Consequently, 


105 
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showing that there exists a tree with more than 2 chipmunks living in its neigh- 
borhood. 


"7.2.2 The Maximum-Minimums Identity 


We start with an identity relating the maximum of a set of numbers to the minimums 
of the subsets of these numbers. 


Proposition For arbitrary numbers x;,i = 1,...,n, 


2.2 


ay = pa = S > min(xj, xj) + a min (xj, Xj, Xk) 
i i<j i<j<k 
ee Ty ine) 
Proof We will give a probabilistic proof of the proposition. To begin, assume that all 
the x; are in the interval [0, 1]. Let U be a uniform (0, 1) random variable, and define 
the events Aj,i=1,...,n, by Aj = {U < x;}. That is, A; is the event that the uniform 


random variable is less than x;. Because at least one of these events A; will occur if 
U is less than at least one of the values x;, we have that 


U;Aj = {u < max] 
i 


Therefore, 
P(U;Aj) = P | U < max. = max Xx; 
L Ll 
Also, 
P(A;) = P{U << ee 


In addition, because all of the events A;,,...,A;, will occur if U is less than all the 
values X;,,...,X;,, we see that the intersection of these events is 


Ai, seep. = {» < min «| 
Yl ee 
implying that 
P(Aj, ...Aj,) =PU < min xi g = min Xi, 
eee z (aloe : 


Thus, the proposition follows from the inclusion—exclusion formula for the probabil- 
ity of the union of events: 


PUA) = >) P(A) — 95 PAA) + D> P(A: A;AR) 
i i<j i<j<k 


ee, a Pay A) 


When the x; are nonnegative, but not restricted to the unit interval, let c be such 
that all the x; are less than c. Then the identity holds for the values y; = x;/c, and the 
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desired result follows by multiplying through by c. When the x; can be negative, let 
b be such that x; + b > 0 for alli. Therefore, by the preceding, 


max(xj + b) =) 0(xj + b) — D> min(x + b,x; + 5b) 


i i<j 


ease (19 ming + Bycicgn 4D) 


Letting 


M= bee - Y > min(xj,x)) dieses (1) nny... 5) 


i<j 


we can rewrite the foregoing identity as 


Se al n- (3) e-+e0™(2) 
n_ n n{n 
0=(1 —- 1) wtane (3) eee (") 


The preceding two equations show that 


But 


maxx; = M 
1 


and the proposition is proven. 


It follows from Proposition 2.2 that for any random variables X1,...,Xn, 
X= SY° Xi - nC X, Xi) eee TY in OG 5 XY 
max X; dX oe Xj) te + (Dt minX, ...,Xn) 


Taking expectations of both sides of this equality yields the following relationship 
between the expected value of the maximum and those of the partial minimums: 


[max x; ] = XA - y > Elmin(X;, X))] 
i<j 


Ee Oe bias fii ©. eee an) (2.7) 


Coupon collecting with unequal probabilities 


Suppose there are n types of coupons and that each time one collects a coupon, it 
is, independently of previous coupons collected, a type i coupon with probability pj, 
n 


>* pi = 1. Find the expected number of coupons one needs to collect to obtain a 
i=1 

complete set of at least one of each type. 

Solution If we let X; denote the number of coupons one needs to collect to obtain 
a type i, then we can express X as 
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Because each new coupon obtained is a type i with probability p;, X; is a geometric 
random variable with parameter p;. Also, because the minimum of Xj; and_X; is the 
number of coupons needed to obtain either a type i or a type j, it follows that for 
i # j,min (Xj, X;) is a geometric random variable with parameter p; + pj. Similarly, 
min (X;, X;, Xx), the number needed to obtain any of types i, j, or k, is a geometric 
random variable with parameter p; + pj + px, and so on. Therefore, the identity (2.7) 


yields 


E 


Noting that 


1 1 1 
[x] = PD Siewora 
Er aT + Pj icjek Pi + Pj + Pk 
1 
ede 
Pi +++: + Pa 


and using the identity 


n 
_ lat — @ Pik) = ye 
i=1 i 


_ Po AS goog Be (a1) eres el 


i<j 


shows, upon integrating the identity, that 


Axj=f (1-T]Ja -e%) a4 
x= | [I ePity | dy 


which is a more useful computational form. | 


7.3 Moments of the Number of Events that Occur 


Many of the examples solved in the previous section were of the following form: For 


given events Aj,.. 


.,An, find ELX], where X is the number of these events that occur. 


The solution then involved defining an indicator variable J; for event A; such that 


Because 


1, if A; occurs 


we obtained the result 


i= 0, otherwise 
n 
X= oii 
i=1 
n n n 
E[IX]=E|>° | =>° Ali) =>" Pwd (3.1) 
i=1 i=1 i=1 


Now suppose we are interested in the number of pairs of events that occur. 
Because JjJ; will equal 1 if both A; and Aj occur and will equal 0 otherwise, it fol- 


lows that the number of pairs is equal to > 


j<jlilj. But because X is the number of 
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events that occur, it also follows that the number of pairs of events that occur is er 


Consequently, 
xX 
(7) = 


i<j 


where there are (5) terms in the summation. Taking expectations yields 


E (2) = )0 Elli] = D5 PAD) (3.2) 


i<j i<j 
= A(X = 1) 
1D [| = > PiA)) 
i<j 
giving that 
E[X?] — E[X] =2) > P(AjA)) (3.3) 
i<j 


which yields ELX?], and thus Var(X) = E[X?] — (E[X])’. 
Moreover, by considering the number of distinct subsets of k events that all 


occur, we see that 
xX 
iy <ip 2 celp 
Taking expectations gives the identity 


E (7) => > Eli, Ln oe Ti] = oe P(A;, Ai, on -Ai,) (3.4) 


ly <in<... <i 1 <i7 <...<ig 


Moments of binomial random variables 


Consider n independent trials, with each trial being a success with probability p. Let 
Aj; be the event that trial i is a success. When i # j, P(AjAj) = p. Consequently, 


Equation (3.2) yields 
xX n 
E — -_ P 
(; )| dP (:)p 


or 
E[X(X — 1)] =n(n — 1)p” 


or 
E[X?] — E[X] =n(n — 1)p* 


Now, E[X] = “_, P(Ai) = np, so, from the preceding equation 
Var(X) = E[X*] — (E[X])? = n(n — Dp* + np — (py = np — p) 


which is in agreement with the result obtained in Section 4.6.1. 
In general, because P(Aj, Aj, --- Aj,) = p* , we obtain from Equation (3.4) that 


[O]-,E.0- We 


1y <In <...<Ix 
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or, equivalently, 
AXE = el =k + Dl Hah = ee = & bp 


The successive values ELX*], k = 3, can be recursively obtained from this identity. 
For instance, with k = 3, it yields 


E[X(X — 1)(X — 2)) =n(n — 1)(n — 2)p? 


or 
ELM? = 3X? + 2X anh = 1G — Dp? 


or 


E[X?] = 3E[X?] — 2ELX] + n(n — In — 2)p° 
= 3n(n — 1p? + np + n(n — 1) — 2)p? = 


Moments of hypergeometric random variables 


Suppose n balls are randomly selected from an urn containing N balls, of which m 
are white. Let A; be the event that the ith ball selected is white. Then _X, the number 
of white balls selected, is equal to the number of the events Aj,...,A, that occur. 
Because the ith ball selected is equally likely to be any of the N balls, of which m are 
white, P(A;) = m/N. Consequently, Equation (3.1) gives that E[X] = }“_, P(Aj) = 


nm/N. Also, since 
mm—1 
P(A;Aj) = P(Aj)P(AjIAi) = NN od 


we obtain, from Equation (3.2), that 
x\| mim — 1) __ (n\m(m — 1) 
|(3)| 7 2 Naw -1))- (3) ew -— 1) 
or 


m(m — 1) 


showing that 
m(m — 1) 


+ ELX] 


This formula yields the variance of the hypergeometric, namely, 


Var(X) = ELX?] — (E[X])? 


ith ym — 1) é nm n2m 

N(N — 1) N N2 

_ mn E — 1)(m — 1) 1 7 
N N-1 N 


which agrees with the result obtained in Example 8j of Chapter 4. 
Higher moments of X are obtained by using Equation (3.4). Because 


m(m — 1)---(m — k + 1) 
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Equation (3.4) yields 
E X\ | (ay mn — 1) Gn — & + 1) 
(;) 7 Onn — 1). (N-k+1) 


ANY =e = ba] 


or 


=n(n — 1)---a—k + NW — 1)---(N —k 4+ 1) 


Moments in the match problem 


For i = 1,...,N, let A; be the event that person i selects his or her own hat in the 
match problem. Then 


1 1 
P(AjAj) = P(Aj)P(Aj|Ai) = NN 


which follows because, conditional on person i selecting her own hat, the hat selected 
by person j is equally likely to be any of the other N — 1 hats, of which one is his 
own. Consequently, with X equal to the number of people who select their own hat, 
it follows from Equation (3.2) that 


thus showing that 
E[X(X — LJ =1 


Therefore, E[X*] = 1 + ELX]. Because ELX] = ae P(A;) = 1, we obtain that 
Var(X) = E[X?] — (E[X])? =1. 


Hence, both the mean and variance of the number of matches is 1. For higher 
moments, we use Equation (3.4), along with the fact that P(Aj,Aj,---Aji,) = 


NID AINTERT t© obtain 
eG) |= G) ; 
kd |” \RING = De = hs D 
or 
BX = 1X = 4 1) S1 7 


Another coupon-collecting problem 


Suppose that there are N distinct types of coupons and that, independently of past 
types collected, each new one obtained is type j with probability p;, pak py = 1. 
Find the expected value and variance of the number of different types of coupons 
that appear among the first n collected. 


Solution We will find it more convenient to work with the number of uncollected 
types. So, let Y equal the number of types of coupons collected, and let ¥ = N — Y 
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denote the number of uncollected types. With A; defined as the event that there are 
no type i coupons in the collection, X is equal to the number of the events Aj,..., Aw 
that occur. Because the types of the successive coupons collected are independent, 
and, with probability 1 — p; each new coupon is not type i, we have 


P(A;) = (1 — pi)” 


Hence, ELX] = Ee (1 — p;)", from which it follows that 


N 
E[Y]=N - E[X]=N -— od - pi" 
i=1 
Similarly, because each of the n coupons collected is neither a type i nor a type j 
coupon, with probability 1 — p; — p;, we have 
P(AjAj) = (1 — pi — pj)", iFj 
Thus, 
E[X(X — 1)]=2) > P(AiA)) = 2500 — pi - py” 
i<j i<j 
or 


E[X?]=2) 0 — pi — pj)” + EX] 


i<j 
Hence, we obtain 


Var(Y) = Var(X) 


= E[X?] — (E[X])’ 
2 


N N 
=2)°d — pi - p)" + ¥0d — pd" — | 0d = pi" 
i<j i=1 i=1 
In the special case where p; = 1/N,i=1,..., N, the preceding formulas give 


and 


Var(Y) = N(N v(t =) n(1 =) ne (1 a | 


The negative hypergeometric random variables 


Suppose an urn contains n + m balls, of which are special and m are ordinary. These 
items are removed one at a time, with each new removal being equally likely to be 
any of the balls that remain in the urn. The random variable Y, equal to the number 
of balls that need be withdrawn until a total of r special balls have been removed, 
is said to have a negative hypergeometric distribution. The negative hypergeometric 
distribution bears the same relationship to the hypergeometric distribution as the 
negative binomial does to the binomial. That is, in both cases, rather than considering 
a random variable equal to the number of successes in a fixed number of trials (as 
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are the binomial and hypergeometric variables), they refer to the number of trials 
needed to obtain a fixed number of successes. 

To obtain the probability mass function of a negative hypergeometric random 
variable Y, note that Y will equal k if both 


1. the first k — 1 withdrawals consist of r — 1 special and k — +r ordinary balls 
and 


2. the kth ball withdrawn is special. 
Consequently, 


ey | Bia) n-r+1 


Coy) n+m—-k+1 


PLY =kj}= 


We will not, however, utilize the preceding probability mass function to obtain the 


mean and variance of Y. Rather, let us number the m ordinary balls as 01,...,0m, 
and then, for each i = 1,...,n, let A; be the event that 0; is withdrawn before r 
special balls have been removed. Then, if X is the number of the events Ay,...,Am 


that occur, it follows that X is the number of ordinary balls that are withdrawn before 
a total of r special balls have been removed. Consequently, 


Y=r+X 
showing that 
m 
E[Y]=r + E[X]=r + )°P(AD 
i=1 


To determine P(Aj), consider the n + 1 balls consisting of 0; along with the n special 
balls. Of these n + 1 balls, 0; is equally likely to be the first one withdrawn, or the 
second one withdrawn, ..., or the final one withdrawn. Hence, the probability that 
it is among the first r of these to be selected (and so is removed before a total or r 


special balls have been withdrawn) is =. Consequently, 
r 
P(Aj) = 
(Aj) aed 
and 
1 
Aviat r _tatmt+)h 
n+1 n+1 


Thus, for instance, the expected number of cards of a well-shuffled deck that would 
need to be turned over until a spade appears is 1 + 2 = 3.786, and the expected 
number of cards that would need to be turned over until an ace appears is 
1+ $=106. 

To determine Var(Y) = Var(X), we use the identity 


E[X(X — )] =2)- P(A;Aj) 
i<j 


Now, P(A;A)j) is the probability that both 0; and o; are removed before there have 
been a total of r special balls removed. So consider the n + 2 balls consisting of 0;, 0;, 
and the n special balls. Because all withdrawal orderings of these balls are equally 


Example 
3f 


A First Course in Probability 339 


likely, the probability that 0; and 0; are both among the first r + 1 of them to be 
removed (and so are both removed before r special balls have been withdrawn) is 


2 n 
eee (G) () _ rr + 1) 
Cy Ne 
Consequently, ( 1) 
m rr + 
E[X(X — D] = arpa eD 
so 
EX] = mm -— pT TY __ Ey 


(n + 1I)(n + 2) 
Because E[X] = m7, this yields 


_ _ rir + 1) r r ‘i 
Var(Y) = Var(X) = m(m ea = Dae) + m a4 ( ) 


A little algebra now shows that 


mrin+1—-nn+m +4 1) 
(n + 1)?(n + 2) 


Var(Y) = 


Singletons in the coupon collector’s problem 


Suppose that there are n distinct types of coupons and that, independently of past 
types collected, each new one obtained is equally likely to be any of the n types. 
Suppose also that one continues to collect coupons until a complete set of at least 
one of each type has been obtained. Find the expected value and variance of the 
number of types for which exactly one coupon of that type is collected. 


Solution Let X equal the number of types for which exactly one of that type is 
collected. Also, let 7; denote the ith type of coupon to be collected, and let A; be 
the event that there is only a single type 7; coupon in the complete set. Because X 
is equal to the number of the events A,,...,A,, that occur, we have 


E[X] = 95 P(AD 
i=1 


Now, at the moment when the first type 7; coupon is collected, there remain — i 
types that need to be collected to have a complete set. Because, starting at this 
moment, each of these n — i + 1 types (the n — i not yet collected and type Tj) 
is equally likely to be the last of these types to be collected, it follows that the type 


T; will be the last of these types (and so will be a singleton) with probability ae 


1 


Consequently, P(A;) = ;=j;7. yielding 


n 


1 a 
gles ree 


To determine the variance of the number of singletons, let S;;, for i < j, be the event 
that the first type 7; coupon to be collected is still the only one of its type to have 
been collected at the moment that the first type 7; coupon has been collected. Then 


P(AjAj) = P(AjA\|Sij)PCSi,) 
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Now, P(S;;) is the probability that when a type 7; has just been collected, of the 
n — i+ 1 types consisting of type 7; and the n — ias yet uncollected types, a type Tj 
is not among the first 7 — i of these types to be collected. Because type 7; is equally 
likely to be the first, or second, or ...,1 — i + 1 of these types to be collected, 
we have . . . 

j-i n+1-j 
Rete1 “#1 =i 


P(Sij) =1 


Now, conditional on the event $;;, both A; and A; will occur if, at the time the first 
type 7; coupon is collected, of the n — j + 2 types consisting of types 7;, 7;, and the 
n — jas yet uncollected types, 7; and 7; are both collected after the other n — j. But 
this implies that 


aa ase - +2n — ; + 1 
Therefore, ; 
Sn Ga heuguamagy © 
yielding 


1 
Nt+1—-d)nm+2—-)) 


E[X(X — 1] =4)> 
i<j 

Consequently, using the previous result for ELX], we obtain 
1 n 


“1 1 
ee eee e sp pie 2 


i=1 i=1 


7.4 Covariance, Variance of Sums, and Correlations 


The following proposition shows that the expectation of a product of independent 
random variables is equal to the product of their expectations. 


Proposition If X and Y are independent, then, for any functions h and g, 


4.1 
Elg(X)n(y)] = Elg(X)JE[AY)] 


Proof Suppose that X and Y are jointly continuous with joint density f(x, y). Then 
Egon] = ff gorhynfe,ydedy 
=| [senor few frordedy 


2 I h(y) fy dy i a(x) f(x) dx 


= E[h(Y)JElg(x)] 


The proof in the discrete case is similar. 


Just as the expected value and the variance of a single random variable give 
us information about that random variable, so does the covariance between two 
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random variables give us information about the relationship between the random 
variables. 


Definition 
The covariance between X and Y, denoted by Cov (X, Y), is defined by 


Cov(X, Y) = E[(X — ELX](Y — E[Y))] 


Upon expanding the right side of the preceding definition, we see that 


Cov(X, Y) = E[XY — E[LX]Y — XE[Y] + E[Y]ELX]] 
= E[XY] — E[X]E[Y] — ELX]E[Y] + ELXJE[Y] 
= E[XY] — E[X]E[Y] 
Note that if X and Y are independent, then, by Proposition 4.1, Cov(X, Y) = 0. 
However, the converse is not true. A simple example of two dependent random 


variables X and Y having zero covariance is obtained by letting X be a random 
variable such that 


P(X = 0} = PIX = 1) = PUX = -1} = 5 


and defining 


Now, XY = 0, so E[XY] = 0. Also, ELX] = 0. Thus, 
Cov(X, Y) = E[XY] — E[X]E[Y] =0 


However, X and Y are clearly not independent. 
The following proposition lists some of the properties of covariance. 


Proposition (i) Cov(X, Y) = Cov(Y, X) 
4.2 (ii) Cov(X, X) = Var(X) 
(ili) Cov(axX, Y) = a Cov(X, Y) 


n m 


n m 
(iv) Cov >7 2X, ¥ |= >. Y Cover, ¥) 
i=1 j=l 


i=1 j=1 


Proof of Proposition 4.2 Parts (i) and (ii) follow immediately from the definition 
of covariance, and part (iii) is left as an exercise for the reader. To prove part (iv), 
which states that the covariance operation is additive (as is the operation of taking 
expectations), let w; = ELX;] and vj = E[Y;]. Then 


n n 
apa ose pa oa 
i=1 i=1 j 
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and 


Cov dx DY) =E dx — Domi ae ae a 


=E xe. - iti) (Yj — vj) 


i=1 j=l 
=E|\°\ (Xi — wi(¥j - vj) 
i=1 j=1 
= DID — HdY — vp 
i=1 j=1 


where the last equality follows because the expected value of a sum of random vari- 
ables is equal to the sum of the expected values. 


It follows from parts (ii) and (iv) of Proposition 4.2, upon taking Y; = Xj,j = 


1,...,”, that 
n n n 
Var | Xi | =Cov| 9° Xi, > X; 
i=1 i=1 j=l 
n n 
= a ‘2 Cov(Xj, Xj) 
i=1 j=1 


n 
= D7 | Cov(Xi, Xi) + D7 Cov(X, X)) 
i=1 jyti 


- = Lover 4 ca ‘2 Cov(X;, Xj) 


call 


Since each pair of indices i, j,i # j, appears twice in the double summation, the pre- 
ceding formula is equivalent to 


\> Xx; = var 4. 25, Cov(X;, Xj) (4.1) 
i=1 


i<j 


If X1,...,Xn are pairwise independent, in that X; and Xj; are independent for 
i # j, then Equation (4.1) reduces to 


Var » Xj > Var (Xi) 


The following examples illustrate the use of Equation (4.1). 


Example 
4a 
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Let X1,..., Xn be independent and identically distributed random variables having 


—_ n 
expected value jz and variance o%, and as in Example 2c, let ¥ = }> Xj/n be the 
i=1 
sample mean. The quantities X¥; — X,i = 1,...,n, are called deviations, as they 
equal the differences between the individual data and the sample mean. The random 
variable 


is called the sample variance. Find (a) Var(X) and (b) E[S7]. 


Solution 


(a) Var(X) = (t) Var Dx 
( 


2 Ht 
) Dee by independence 


(b) We start with the following algebraic identity: 


(n— 1S? =X — w+ — XY 
i=1 


= 0% - wy +  - wy)? - 2% - wh DO - w) 


i=1 i=1 i=1 


= DUG = w)? + n& — py? — 2K — wyn(X — w) 
i=1 


= DI — Ww? — n& — p)? 


Taking expectations of the preceding yields 


(n — 1)E[S’] = So BL — p)’] — nE|X —- 2)’ 
i=1 


= no” — nVar(X) 
=(n — 1)o” 


where the final equality made use of part (a) of this example and the one preceding 
it made use of the result of Example 2c, namely, that ELY] = jw. Dividing through 
by 1 — 1 shows that the expected value of the sample variance is the distribution 
variance o7. a 

Our next example presents another method for obtaining the variance of a bino- 
mial random variable. 
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Example 
4b 


Example 
4c 


Variance of a binomial random variable 
Compute the variance of a binomial random variable X with parameters n and p. 


Solution Since such a random variable represents the number of successes in 1 inde- 
pendent trials when each trial has the common probability p of being a success, we 
may write 

KaMy e+ 


where the Xj are independent Bernoulli random variables such that 


1 if the ith trial is a success 


a= 0 otherwise 


Hence, from Equation (4.1), we obtain 


Var(X) = War(X1) + «+» + Var(Xn) 


But 
Var (Xi) = E[X?] — (E[Xi))? 
= E[X] - (EL Xi])* since x = X; 
=p-p 
Thus, 
Var(X) = np(1 — p) a 


Sampling from a finite population 


Consider a set of N people, each of whom has an opinion about a certain sub- 
ject that is measured by a real number v that represents the person’s “strength of 
feeling” about the subject. Let v; represent the strength of feeling of person i, 
i=1,...N. 

Suppose that the quantities v;,i = 1,...,N, are unknown and, to gather infor- 
mation, a group of n of the N people is “randomly chosen” in the sense that all of the 


subsets of size n are equally likely to be chosen. These n people are then ques- 


tioned and their feelings determined. If S denotes the sum of the m sampled values, 
determine its mean and variance. 

An important application of the preceding problem is to a forthcoming election 
in which each person in the population is either for or against a certain candidate or 
proposition. If we take v; to equal 1 if person iis in favor and 0 if he or she is against, 

N 


then v = )° v;/N represents the proportion of the population that is in favor. To 
i=1 

estimate v, a random sample of 1 people is chosen, and these people are polled. 

The proportion of those polled who are in favor—that is, S/n—is often used as an 

estimate of V. 


Solution For each person i,i = 1,...,N, define an indicator variable J; to indicate 
whether or not that person is included in the sample. That is, 


1 if person iis in the random sample 


ie 0 otherwise 
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Now, S can be expressed by 


N 
S= = vilj 
i=1 


NO) 
N 
E[S] = >> wiEli] 
i=1 
N 
Var(S) = )° Var(vili) + 29° > Covivili, vjlj) 
i=1 i<j 
N 
=) vjVarli) + 29° > vivjCov(hi, f) 
i=1 i<j 
Because 
n 
nn—-Il 
E\idj] = — 
Ul = Na 


it follows that 


Var(I;) = N (1 = x) 


2 
Cov(ii, Jj) => am — ») (5) 


N(N — 1) N 
_ —nWW — n) 
~ N2(N — 1) 
Hence, 
Vie 


N 
nin =n 2 2n(N — n) 
ve) = 5 (“R) - ae Lom 


The expression for Var(S) can be simplified somewhat by using the identity 


N 
(vy) 4) + vy)? = a ve + 2)°>¢viv;. After some simplification, we obtain 


i=1 i<j 
N 
vj 
n(N — n)| i=1 = 
Var(S) = 
ar (.S') N-1 N Vv 


Consider now the special case in which Np of the v’s are equal to 1 and the 
remainder equal to 0. Then, in this case, S is a hypergeometric random variable and 
has mean and variance given, respectively, by 


N 
E[S] = nv = np since V= —" =p 
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and 


Var(S) = ow (+ ”*) 


N-1 N 
_ nN — n) 1 
an ee eee 


The quantity S/n, equal to the proportion of those sampled that have values equal to 
1, is such that 


S N-n 
Var ( ) = Aw = pra P) Oo 


The correlation of two random variables X and Y, denoted by p(X, Y), is defined, 
as long as Var(X) Var(Y) is positive, by 


Cov(Xx, Y) 
A(X, Y) = ——— 
J Var(X)Var(Y) 
It can be shown that 
-1s p(xX,Y) =1 (4.2) 


To prove Equation (4.2), suppose that Y and Y have variances given by o2 and Gr 
respectively. Then, on the one hand, 


xX Y 
0 <= Var (2 + x) 
Ox Oy 


Var(X) Var(Y) 2Cov(X, Y) 

= 2 2 
oO 

x y 


= 2[1 + p(X, Y)] 


o OxOy 


implying that 
On the other hand, 
xX y 
0 <= Var ( — 
Ox Oy 
_ Var(X) Var(Y) 2Cov(X, Y) 
~ eg (—oy)? OxOy 
= 2[1 -— p(X,Y)] 
implying that 
pP(xX,Y) =1 


which completes the proof of Equation (4.2). 

In fact, since Var(Z) = 0 implies that Z is constant with probability 1 (this intu- 
itive relationship will be rigorously proven in Chapter 8), it follows from the proof 
of Equation (4.2) that p(X, Y) = 1 implies that Y = a + bX, where b = oy/o, > 0 
and p(X, Y) = —1 implies that Y = a + bX, where b = —oy/o, < 0. We leave it as 


Example 
4d 


Example 
4e 
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an exercise for the reader to show that the reverse is also true: that if Y =a + bX, 
then p(X, Y) is either +1 or —1, depending on the sign of b. 

The correlation coefficient is a measure of the degree of linearity between X 
and Y. A value of p(X, Y) near +1 or —1 indicates a high degree of linearity between 
X and Y, whereas a value near 0 indicates that such linearity is absent. A positive 
value of p(X, Y) indicates that Y tends to increase when X does, whereas a negative 
value indicates that Y tends to decrease when X increases. If p(X, Y) = 0, then X 
and Y are said to be uncorrelated. 


Let J, and [g be indicator variables for the events A and B. That is, 


ie 1 if A occurs 
A~10 — otherwise 


In= 1 if B occurs 
B= 10 — otherwise 


Then 
E[I4] = P(A) 
E{Ip] = P(B) 
ElJaIp] = P(AB) 
so 


Cov(l4, 1p) = P(AB) — P(A)P(B) 
= P(B)[P(A|B) — P(A)] 
Thus, we obtain the quite intuitive result that the indicator variables for A and B 


are either positively correlated, uncorrelated, or negatively correlated, depending 
on whether P(A|B) is, respectively, greater than, equal to, or less than P(A). a 


Our next example shows that the sample mean and a deviation from the sample 
mean are uncorrelated. 


Let X1,...,X), be independent and identically distributed random variables having 
variance o”. Show that 
Cov(X; — X,X) =0 


Solution We have 


Cov(X; — X,X) = Cov(X;,X) — Cov(x,X) 


{<2 2 
=> Cv.) = 
n jt n 
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Example 
4f 


where the next-to-last equality uses the result of Example 4a and the final equality 
follows because 
0 ifj # i by independence 


COV o? ifj=i since Var(X;) = 0” 


Although X and the deviation X; — X are uncorrelated, they are not, in gen- 
eral, independent. However, in the special case where the X; are normal random 
variables, it turns out that not only is X independent of a single deviation, but it is 
independent of the entire sequence of deviations X; — X,j = 1,...,n. This result 
will be established in Section 78, where we will also show that, in this case, the sam- 
ple mean X and the sample variance S* are independent, with (n — 1)S*/o* having 
a chi-squared distribution with n — 1 degrees of freedom. (See Example 4a for the 
definition of $7.) | 


Consider m independent trials, each of which results in any of r possible outcomes 
with probabilities p1,...,pr, )-/_1 pi = 1. If we let Nj,i=1,...,7, denote the number 


of the m trials that result in outcome i, then N;,N2,...,N, have the multinomial 
distribution 
m!| 
PN seg Pr pyr, Dam 


For i # j, it seems likely that when N; is large, N; would tend to be small; hence, it is 
intuitive that they should be negatively correlated. Let us compute their covariance 
by using Proposition 4.2(iv) and the representation 


m m 
N= oil and = N=) Ik) 
k=1 k=1 


where 


Lae 1 if trial k results in outcome i 
™7 "V0 otherwise 


fee 1 if trial kK results in outcome j 
i otherwise 


From Proposition 4.2(iv), we have 


Cov(Ni,Nj) = >> Y > Cov(di(k), ) 


@=1 k=1 
Now, on the one hand, when k # £, 
Cov(/i(k), F(¢)) = 0 
since the outcome of trial & is independent of the outcome of trial ¢. On the other hand, 
Cov(i(€), (®) = EL(OG(O] — EUi(O|ELj | 
= 0 — pipj = —PiPj 
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where the equation uses the fact that /;(¢)J;(€) = 0, since trial £ cannot result in both 
outcome i and outcome j. Hence, we obtain 


Cov(Nj, Nj) = —mpip; 


which is in accord with our intuition that N; and Nj; are negatively correlated. Oo 


7.5 Conditional Expectation 


Example 
Sa 


7.5.1 Definitions 


Recall that if X¥ and Y are jointly discrete random variables, then the conditional 
probability mass function of X, given that Y = y, is defined for all y such that 
P{Y = y} > 0, by 


P(x,y) 
Py) 


It is therefore natural to define, in this case, the conditional expectation of X given 
that Y = y, for all values of y such that py(y) > 0, by 


Pxiy(ly) = P{X =x|Y =y}= 


E[X|¥ =y] =o xP{X =xl¥ = y} 


= = xp xy (x|y) 


If X and Y are independent binomial random variables with identical parameters n 
and p, calculate the conditional expected value of X given that X¥ + Y =m. 


Solution Let us first calculate the conditional probability mass function of X given 
that X + Y =m. Fork S min(n,m), 


PiX=kX+Y= 
ae ee ae ere ao 


P{X + Y=m} 
PixX=kY=m-—h 
~ PLY + Y=m}j 
_ PIX =KhP{(Y=m — kj 
a P(X + Y=m} 


(') pk — py-k (, . pk — pyrnmerk 
(*:) r= py 


where we have used the fact (see Example 3f of Chapter 6) that X¥ + Y is a binomial 
random variable with parameters 2” and p. Hence, the conditional distribution of X, 
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Example 
5b 


given that X + Y = m, is the hypergeometric distribution, and from Example 2g, 
we obtain os 
E[X|X + Y=m|= > Oo 
Similarly, let us recall that if XY and Y are jointly continuous with a joint probabil- 
ity density function f(x, y), then the conditional probability density of X, given that 
Y = y, is defined for all values of y such that fy(y) > 0 by 


f(x,y) 
fy) 


It is natural, in this case, to define the conditional expectation of X, given that 
Y = y, by 


fxiy @ly) = 


EIXIY = y]= / eas 


provided that fy(y) > 0. 


Suppose that the joint density of X and Y is given by 


e7*/Ve-Y 
f@y) = — 0<x<w,0<y<o 
y 


Compute E[X|Y = y]. 
Solution We start by computing the conditional density 


f(x,y) 
fyY) 
fy) 


/ “ f(x,y) dx 
“Ojyee? 

7 i * yee? dx 
(/yer” 

7 i “UyyeX! dx 


1 
= —e-*/y 
y 
Hence, the conditional distribution of X, given that Y = y, is just the exponential 
distribution with mean y. Thus, 


feiy Gly) = 


(oe) 
EXIY =y\= [ ~e-*/) dx =y 0 
0 Yy 


Remark Just as conditional probabilities satisfy all of the properties of ordinary 
probabilities, so do conditional expectations satisfy the properties of ordinary expec- 
tations. For instance, such formulas as 


Ss g(x) pxyy Gly) in the discrete case 


Elg XY =y] = 4 no 
/ g(x) fx\y(@ly) dx in the continuous case 
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and 
n n 
E|) > IY =y| =)>_ FIXY =y] 
i=1 i=1 


remain valid. As a matter of fact, conditional expectation given that Y = y can be 
thought of as being an ordinary expectation on a reduced sample space consisting 
only of outcomes for which Y = y. a 


7.5.2 Computing Expectations by Conditioning 


Let us denote by ELX|Y] that function of the random variable Y whose value at 
Y = yis E[X|Y = y]. Note that ELX|Y] is itself a random variable. An extremely 
important property of conditional expectations is given by the following proposition. 


Proposition 
5.1 


The Conditional Expectation Formula 


E[X] = E[ELX1Y]] (5.1) 


If Y is a discrete random variable, then Equation (5.1) states that 


E[X] = )° E[X|Y = y]PtY = y} (5.1a) 
y 


whereas if Y is continuous with density fy (y), then Equation (5.1) states 


EIX|= [ ELXIY = yIfvondy (5.1b) 


—co 


We now give a proof of Equation (5.1) in the case where X and Y are both discrete 
random variables. 


Proof of Equation (5.1) When X and Y Are Discrete: We must show that 


E[X] = )> ELX|Y = yJPtY = y} (5.2) 


a 


Now, the right-hand side of Equation (5.2) can be written as 


> EIXIY = yJP(Y =y} =) P(X =x|¥ =y}P{Y =y} 


y y x 


= P{X =x,Y=y} _ 
“22 pyay THe 


ay aay a9) 

=) 2) PX =n SH 
x oy 

=) 2P(X =7) 


— ELX] 


and the result is proved. 
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5c 


Example 
5d 


One way to understand Equation (5.2) is to interpret it as follows: To calcu- 
late ELX], we may take a weighted average of the conditional expected value of X 
given that Y = y, each of the terms ELX|Y = y] being weighted by the probabil- 
ity of the event on which it is conditioned. (Of what does this remind you?) This 
is an extremely useful result that often enables us to compute expectations easily 
by first conditioning on some appropriate random variable. The following examples 
illustrate its use. 


A miner is trapped in a mine containing 3 doors. The first door leads to a tunnel that 
will take him to safety after 3 hours of travel. The second door leads to a tunnel that 
will return him to the mine after 5 hours of travel. The third door leads to a tunnel 
that will return him to the mine after 7 hours. If we assume that the miner is at all 
times equally likely to choose any one of the doors, what is the expected length of 
time until he reaches safety? 


Solution Let X denote the amount of time (in hours) until the miner reaches safety, 
and let Y denote the door he initially chooses. Now, 


E[X] = E[X1Y = 1]P{Y = 1} + ELX1Y =2]Pty = 2} 
+ E[X|Y =3]P{Y = 3} 


= S(ELXIY = 1] + ELX|Y =2] + ELX|Y =3)) 
However, 
E[X|Y = 1] =3 
E[X|Y =2)=5 + E[X] (5.3) 


E[X|Y = 3] =7 + ELX] 


To understand why Equation (5.3) is correct, consider, for instance, E[X|Y = 2] 
and reason as follows: If the miner chooses the second door, he spends 5 hours in 
the tunnel and then returns to his cell. But once he returns to his cell, the prob- 
lem is as before; thus, his expected additional time until safety is just ELX]. Hence, 
E|X|Y = 2] =5 + E[X]. The argument behind the other equalities in Equation (5.3) 
is similar. Hence, 


E[X] = 56 +5 + E[X] + 7 + E[X]) 


or 
E[X] = 15 | 


Expectation of a sum of a random number of random variables 


Suppose that the number of people entering a department store on a given day is 
a random variable with mean 50. Suppose further that the amounts of money spent 
by these customers are independent random variables having a common mean of 
$8. Finally, suppose also that the amount of money spent by a customer is also inde- 
pendent of the total number of customers who enter the store. What is the expected 
amount of money spent in the store on a given day? 


Solution If we let N denote the number of customers who enter the store and_X; the 
amount spent by the ith such customer, then the total amount of money spent can 


N 
be expressed as > X;. Now, 
i=1 


Example 
Se 
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N N 
E|)°X;|=£|E| >\ XIN 
1 1 


But 


> Xi by the independence of the X; and N 


E XIN =n - x=» 


= nE[X] where E[X] = E[Xi] 


which implies that 


N 
E| )° XN | = NE[X] 
1 


Thus, 


N 
E| ))X; | = E[NE[X]] = E[NJE[X] 
i=1 


Hence, in our example, the expected amount of money spent in the store is 50 X $8, 
or $400. a 


The game of craps is begun by rolling an ordinary pair of dice. If the sum of the 
dice is 2, 3, or 12, the player loses. If it is 7 or 11, the player wins. If it is any other 
number i, the player continues to roll the dice until the sum is either 7 or i. If it is 
7, the player loses; if it is 7, the player wins. Let R denote the number of rolls of the 
dice in a game of craps. Find 

(a) E[R]; 

(b) E[R\|player wins]; 

(c) E[R|player loses]. 

Solution If we let P; denote the probability that the sum of the dice is i, then 

i-1 
“36° 


To compute E[R], we condition on S, the initial sum, giving 


es ee i299 


iZ 
E[R] = )> E[RIS = iP; 
1=2 


However, 
1, if i= 2,3,711,12 
E[R|S = i] = 1 ; 
——., otherwise 
Pi + P7 
The preceding equation follows because if the sum is a value i that does not end 
the game, then the dice will continue to be rolled until the sum is either i or 7 and 
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the number of rolls until this occurs is a geometric random variable with parameter 
P; + P7. Therefore, 


6p, Wp 
E[R] = 1 —— — 
nt t 2 eae ae 


24 4296/9 4/10 & 5/11) 3376 


To determine E[R|win], let us start by determining p, the probability that the player 
wins. Conditioning on S yields 


12 
p= > Piwin|S = iP; 
i=2 
6 10 
P; P; 

=P7+ Py + P;+ > —_ _P, 

7 11 Lat & P> LF, = P, i 
= 0.493 


where the preceding uses the fact that the probability of obtaining a sum of 7 before 
one of 7 is P;/(P; + P7). Now, let us determine the conditional probability mass 
function of S, given that the player wins. Letting Q; = P{S = i|win}, we have 


Q2 = 03=Q012=0, OQ7=P7/p, Qi =Piu/p 
and, for i = 4,5, 6, 8, 9, 10, 
P{S = i, win} 
Qi = alae Pini 
{win} 
_ PiP{win|S = i} 
P 
a 
p(Pi + P7) 


Now, conditioning on the initial sum gives 


E[R\win] = ) > E[R|win, S = iQ; 


However, as was noted in Example 2j of Chapter 6, given that the initial sum is i, 
the number of additional rolls needed and the outcome (whether a win or a loss) 
are independent. (This is easily seen by first noting that conditional on an initial sum 
of i, the outcome is independent of the number of additional dice rolls needed and 
then using the symmetry property of independence, which states that if event A is 
independent of event B, then event B is independent of event A.) Therefore, 


E[R|win] = ya R|S = i] 
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Although we could determine E[R|player loses] exactly as we did E[R|player 
wins], it is easier to use 


E[R] = E[R|win]p + E[Rjlose](1 — p) 
implying that 


E[R] — E[R|win]p 


E[R\lose] = 
1—p 


= 3.801 2 


As defined in Example 5d of Chapter 6, the bivariate normal joint density function 
of the random variables X and Y is 


2. 
1 1 x—px\? fy— py 
f@y) = ex ( ) ne 
peagiog | =e ee oy 


(x — Mx)(y — Hy) 
p 
OxOy 


2 


We will now show that p is the correlation between X and Y. As shown in Exam- 
ple 5c, wx = ELX], o2 = Var(X), and py = E[Y], oy = Var(Y). Consequently, 


Cov(x, Y) 
OxOy 
E[XY] — xty 


OxOy 


Corr(X, Y) = 


To determine E[XY], we condition on Y. That is, we use the identity 
E[XY] = E[E[XY|Y]| 


Recalling from Example 5d that the conditional distribution of X given that Y = y 
is normal with mean wy + pay — ply), we see that 


E[XY|Y = y] = E[Xy|Y = y] 
= yE[X|Y = y] 


Ox 
=y| ux + p—Y — by) 
oy 
O. 
= yur + p(y — py) 
ay 
Consequently, 


Ox 2 
E[XY|Y] = Yux + mes — pyY) 
y 
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implying that 


lopn 
E[XY] = e| Yo + p2(y? — mn) 
y 
0. 
= MxE[Y] + p>E[Y* — py] 
y 
= Ox 2) 2 
= [xbty + . (zIY ] 13) 
0. 
= Mxby + p—Var(Y) 
Oy 
= Uxby + Poxdy 
Therefore, 
POxOy _ 


Corr(X, Y) = —— = ¢ | 


OxOy 


Sometimes E[X] is easy to compute, and we use the conditioning identity to 
compute a conditional expected value. This approach is illustrated by our next 
example. 


Consider 1 independent trials, each of which results in one of the outcomes 1,...,k, 
with respective probabilities p1,...,px, yy Pi = 1. Let N; denote the number of 
trials that result in outcome i,i=1,...,k. For i # j, find 


(a) E[LN\\Ni > 0] and (b) E[N\\Ni > 1) 
Solution To solve (a), let 


0, ifN; =0 
1, ifN; > 0 


Then 
E[Nj] = E[Nj|I = O]PU = 0} + E[N}I = 1P( = 1) 


or, equivalently, 
E|N;] = ELNj\Ni = O]P{Ni = 0} + ELN|INi > O]P{Ni > 0} 


Now, the unconditional distribution of N; is binomial with parameters n, p;. Also, 


given that Nj = r, each of the n — r trials that does not result in outcome i will, 
independently, result in outcome j with probability P(j|not i) = aa Consequently, 


L 


the conditional distribution of Nj, given that N; = r, is binomial with parameters 


n—-T, = (For a more detailed argument for this conclusion, see Example 4c of 


Chapter 6.) Because P{N; = 0} = (1 — pj)”, the preceding equation yields 


np =n 4 — pi)" + E[NjIN; > 0] - ( — pi”) 
Ll 


giving the result 


1-( — pj"! 
1 — (i — pj)" 


E[LN\\Ni > 0] = Npj 
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We can solve part (b) in a similar manner. Let 


0, ifN;-0 
J=41, ifN=1 
2, ifN>1 


Then 


E[N;] = E[Nj\J = 0|PUJ = 0} + E[NjlJ = 1]PU = 1} 
+ E[Nj|J = 2]PU = 2} 


or, equivalently, 


E[Nj] = ELNjINi = OJP{Ni = 0} + ELNINi = 1] PLN = 13 
+ E[Nj|N; > 1|P{N; > 1) 


This equation yields 


npj =n—/_a — pi)" + @ —- DE 
1. = Bi hn 


+ E[N|N; > 1] — @ — pp” — mpi — pp") 


npi(l — pip"! 
Pi 


giving the result 
npjll — (A — pi" — (n= lpi — pi") 
1 =. = pi)” = np = py 


It is also possible to obtain the variance of a random variable by conditioning. 
We illustrate this approach by the following example. 


E[N\\Ni = 1] = a 


Variance of the geometric distribution 


Independent trials, each resulting in a success with probability p, are successively 
performed. Let N be the time of the first success. Find Var(N). 


Solution Let Y = 1 if the first trial results in a success and Y = 0 otherwise. Now, 
Var(N) = E[N*] — (E[N])° 
To calculate E[N7], we condition on Y as follows: 
E[N?] = E[E[N?|¥]] 
However, 
E[N7|Y =1]=1 
E[N*|Y = 0] = E[A + N)"] 


These two equations follow because, on the one hand, if the first trial results in a 
success, then, clearly, N = 1; thus, N? = 1. On the other hand, if the first trial results 
in a failure, then the total number of trials necessary for the first success will have 
the same distribution as 1 (the first trial that results in failure) plus the necessary 
number of additional trials. Since the latter quantity has the same distribution as N, 
we obtain E[N?|Y = 0] = E[(1 + N)*]. Hence, 
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E[N?] = E[N7|Y =1]P({Y =1} + E[N’|Y = 0]P{Y = 0} 
=p+(— p)E[( + N)’] 
=1+ (1 — p)E[2N + N’] 


However, as was shown in Example 8b of Chapter 4, E[N] = 1/p; therefore, 


(1 — p)E[N?] 


or 


Consequently, 


Var(N) = E[N*] — (E[N])? 


ao! 
Qe Pp 


Les p 


Example Consider a gambling situation in which there are r players, with player i initially 
5i having n; units, n; > 0, i = 1,...,r. At each stage, two of the players are chosen to 
play a game, with the winner of the game receiving 1 unit from the loser. Any player 
whose fortune drops to 0 is eliminated, and this continues until a single player has 
all n = )“\_,n; units, with that player designated as the victor. Assuming that the 
results of successive games are independent and that each game is equally likely to 
be won by either of its two players, find the average number of stages until one of 
the players has all v units. 


Solution To find the expected number of stages played, suppose first that there are 
only 2 players, with players 1 and 2 initially having j and n — j units, respectively. 
Let X; denote the number of stages that will be played, and let m; = ELXj]. Then, 
for j= 1,...,n — 1, 

Xj =1+ Aj 


where A; is the additional number of stages needed beyond the first stage. Taking 
expectations gives 
m= 1+ E[Aj] 


Conditioning on the result of the first stage then yields 
mj =1 + E[Aj\1 wins first stage]1/2 + E[Aj|2 wins first stage]1/2 


Now, if player 1 wins at the first stage, then the situation from that point on is exactly 
the same as in a problem that supposes that player / starts with j + 1 and player 2 
with n — (j + 1) units. Consequently, 


E[Aj|1 wins first stage] = mj+1 


and, analogously, 
E[Aj|2 wins first stage] = mj_1 
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Thus, 


Hf 1 
malt 3 Mj+1 + ail 


or, equivalently, 
Ma, =2mj — m1 — 2, j=l,....2—-1 (5.4) 
Using that mo = 0, the preceding equation yields 


my = 2m, —2 
m3=2m, — my — 2=3m, — 6=3(m, — 2) 
m4 =2m3 — m2 — 2=4m, — 12=4(m, — 3) 


suggesting that 
m = i(m, — i + 1), i=1l,...,n (5.5) 


To prove the preceding equality, we use mathematical induction. Since we’ve already 
shown the equation to be true for i = 1,2, we take as the induction hypothesis that 
it is true whenever i = j < n. Now we must prove that it is true for j + 1. Using 
Equation (5.4) yields 


M1 = 2m; = mMy-1 —2 

= 2j(m —j + 1)-—G -— Im —j + 2) -—2 (by the induction hypothesis) 

= + Dm = 27 +24 7 = 37 +2 =2 

=G+)m—-7 -j 

=G + Dim —- j) 
which completes the induction proof of (5.5). Letting i = n in (5.5), and using that 
My = 0, now yields that 

m=n—1 


which, again using (5.5), gives the result 
mj =i(n — 1) 


Thus, the mean number of games played when there are only 2 players with initial 
amounts i and n — iis the product of their initial amounts. Because both players 
play all stages, this is also the mean number of stages involving player 1. 

Now let us return to the problem involving r players with initial amounts 1;,i = 
igs, hy ae nj = n. Let X denote the number of stages needed to obtain a victor, 
and let X; denote the number of stages involving player i. Now, from the point of 
view of player i, starting with n;, he will continue to play stages, independently being 
equally likely to win or lose each one, until his fortune is either n or 0. Thus, the 
number of stages he plays is exactly the same as when he has a single opponent with 
an initial fortune of n — n;. Consequently, by the preceding result, it follows that 


E[Xi] = nin — ni) 


so 


r ia a 
E > Xi = inn - nj) =n? - Yon; 
i=1 i=1 i=1 
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But because each stage involves two players, 


i=1 


It is interesting to note that while our argument shows that the mean number of 
stages does not depend on the manner in which the teams are selected at each stage, 
the same is not true for the distribution of the number of stages. To see this, suppose 
r= 3, ny = nz = 1, and ny = 2. If players / and 2 are chosen in the first stage, then 
it will take at least three stages to determine a winner, whereas if player 3 is in the 
first stage, then it is possible for there to be only two stages. a 


In our next example, we use conditioning to verify a result previously noted in 
Section 6.3.1: that the expected number of uniform (0, 1) random variables that need 
to be added for their sum to exceed 1 is equal to e. 


Let U,, U2,... be a sequence of independent uniform (0, 1) random variables. Find 
E[N] when 


Solution We will find E[N] by obtaining a more general result. For x € [0, 1], let 


n 
N(x) = min 4711: > U; >x 
i=1 


and set 
m(x) = E[N(@)] 


That is, N(x) is the number of uniform (0, 1) random variables we must add until 
their sum exceeds x, and m(x) is its expected value. We will now derive an equation 
for m(x) by conditioning on Uj. This gives, from Equation (5.1b), 


1 
nei [ EIN@IUi = yl dy (5.6) 
Now, 


1+m(x-—y) ifysx Ce 


EIN@)W1 = y] = ee 
The preceding formula is obviously true when y > x. It is also true when y = x, 
since, if the first uniform value is y, then, at that point, the remaining number of 
uniform random variables needed is the same as if we were just starting and were 
going to add uniform random variables until their sum exceeded x — y. Substituting 
Equation (5.7) into Equation (5.6) gives 
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m(x)=1 + / m(x — y)dy 
0 


: 
fk / yeah by letting 
0 u=x—-—y 
Differentiating the preceding equation yields 
m'(x) = m(x) 


or, equivalently, 
m'(x) 
=1 


mx) 
Integrating this equation gives 
log[m(x)] =x + ¢ 
or 
m(x) = ke“ 
Since m(0) = 1, it follows that k = 1, so we obtain 
m(x) = e* 


Therefore, m(1), the expected number of uniform (0, 1) random variables that need 
to be added until their sum exceeds 1, is equal to e. i 


7.5.3 Computing Probabilities by Conditioning 


Not only can we obtain expectations by first conditioning on an appropriate random 
variable, but we can also use this approach to compute probabilities. To see this, let 
A denote an arbitrary event, and define the indicator random variable X by 


ye 1 if A occurs 
~ 10 if A does not occur 


It follows from the definition of X that 


E[X] = P(A) 
E[X|Y = y]= P(A|Y = y) for any random variable Y 


Therefore, from Equations (5.1a) and (5.1b), we obtain 


P(A) =) P(A|Y = y)P(Y =y)_ if Y is discrete 
y 
a (5.8) 
a i P(A|Y =y)fyQ)dy_ if Y is continuous 
CO 


Note that if Y is a discrete random variable taking on one of the values yj,..., yn, 
then by defining the events B;,i = 1,...,n, by Bj = {Y = yi}, Equation (5.8) reduces 
to the familiar equation 
n 
P(A) = 9) P(A|Bi) P(Bi) 
i=1 


where B,,..., 8, are mutually exclusive events whose union is the sample space. 
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The best-prize problem 


Suppose that we are to be presented with n distinct prizes, in sequence. After being 
presented with a prize, we must immediately decide whether to accept it or to reject 
it and consider the next prize. The only information we are given when deciding 
whether to accept a prize is the relative rank of that prize compared to ones already 
seen. That is, for instance, when the fifth prize is presented, we learn how it compares 
with the four prizes we’ve already seen. Suppose that once a prize is rejected, it is 
lost, and that our objective is to maximize the probability of obtaining the best prize. 
Assuming that all ! orderings of the prizes are equally likely, how well can we do? 


Solution Rather surprisingly, we can do quite well. To see this, fix a value k,0 = 
k < n, and consider the strategy that rejects the first k prizes and then accepts the 
first one that is better than all of those first k. Let P;, (best) denote the probability that 
the best prize is selected when this strategy is employed. To compute this probability, 
condition on X, the position of the best prize. This gives 


n 
P,.(best) = 2 P,(best|X = i)P(X = i) 

i=1 

1 n 

= —)) Px(best|X = i) 
ie i=1 

Now, on the one hand, if the overall best prize is among the first k, then no prize is 
ever selected under the strategy considered. That is, 


Py(best|LX¥ =i) =0 ifisk 


On the other hand, if the best prize is in position i, where i > k, then the best prize 
will be selected if the best of the first i — 1 prizes is among the first & (for then none 
of the prizes in positions k + 1,k + 2,...,i — 1 would be selected). But, conditional 
on the best prize being in position i, it is easy to verify that all possible orderings of 
the other prizes remain equally likely, which implies that each of the first i — 1 prizes 
is equally likely to be the best of that batch. Hence, we have 


Px(best|X = i) = P{best of first i — 11s among the first kX = i} 


k 


— 


From the preceding, we obtain 


ee 1 
P,(best) = — », 
R, 
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Now, if we consider the function 


_*, n 
g(x) = 7 og(“) 


then 


SO 


x e 


Thus, since P; (best) ~ g(k), we see that the best strategy of the type considered is to 
let the first n/e prizes go by and then accept the first one to appear that is better than 
all of those. In addition, since g(m/e) = 1/e, the probability that this strategy selects 
the best prize is approximately 1/e ~ .36788. 


Remark Most people are quite surprised by the size of the probability of obtain- 
ing the best prize, thinking that this probability would be close to 0 when n is large. 
However, even without going through the calculations, a little thought reveals that 
the probability of obtaining the best prize can be made reasonably large. Consider 
the strategy of letting half of the prizes go by and then selecting the first one to 
appear that is better than all of those. The probability that a prize is actually selected 
is the probability that the overall best is among the second half, and this is 7 In addi- 
tion, given that a prize is selected, at the time of selection that prize would have been 
the best of more than n/2 prizes to have appeared and would thus have probability 
of at least 5 of being the overall best. Hence, the strategy of letting the first half of 
all prizes go by and then accepting the first one that is better than all of those prizes 
has a probability greater than i of obtaining the best prize. a 


Let U be a uniform random variable on (0, 1), and suppose that the conditional 
distribution of X, given that U = p, is binomial with parameters n and p. Find the 
probability mass function of X. 


Solution Conditioning on the value of U gives 


1 
P(X =i) = i P(X = i|U = phfu(p) dp 


1 
/ PLX = ilU =p} dp 
0 


n! ‘i n-i 
= in =o! [ pil — p)" ‘dp 
Now, it can be shown (a probabilistic proof is given in Section 6.6) that 
a a4 il(n — i)! 
i pil — p)""dp = j@+D! 


Hence, we obtain 


364 Chapter 7 Properties of Expectation 


Example 
5m 


That is, we obtain the surprising result that if a coin whose probability of coming up 
heads is uniformly distributed over (0, 1) is flipped n times, then the number of heads 
occurring is equally likely to be any of the values 0,...,7. 

Because the preceding conditional distribution has such a nice form, it is worth 
trying to find another argument to enhance our intuition as to why such a result 
is true. To do so, let U, Uj,...,Un be n + 1 independent uniform (0, 1) random 
variables, and let XY denote the number of the random variables U;,..., U, that are 
smaller than U. Since all the random variables U,U,,...,U, have the same distri- 
bution, it follows that U is equally likely to be the smallest, or second smallest, or 
largest of them; so X is equally likely to be any of the values 0,1,...,. However, 
given that U = p, the number of the U; that are less than U is a binomial random 
variable with parameters n and p, thus establishing our previous result. | 


A random sample of X balls is chosen from an urn that contains n red and m blue 
balls. If X is equally likely to be any of the values 1,...,”, find the probability that 
all the balls in the sample are red. 


Solution Conditioning on X yields 


n 
P(all balls are red) = y> Pall balls are red|X = i1)P(X = i) 
i=1 
Now, given that the sample is of size i, each of the (”*’”) subsets of size i is equally 
likely to be the chosen set of balls. As (') of these subsets have all red balls, it follows 


that P{all balls are red|X = i} = at and thus that 


1 n () 

Pall balls are red) = — ) (' my 
n 
i=1 U 


However, though not obvious, it turns out that the preceding can be simplified, and 
indeed yields the surprising result that 


1 
P(all balls are red) = ———.,, foralln,m 
m+1 


To prove the preceding formula, we will not make use of our earlier result, but rather 
we will use induction on n. When n = 1, the urn contains 1 red and m blue balls and 
so a random sample of size 1 will be red with probability a So, assume the result 
is true whenever the urn contains n — 1 red and m blue balls and a random sample 
whose size is equally likely to be any of 1,...,” — 11s to be chosen. Now consider 
the case of n red and m blue balls. Start by conditioning not on the value of X but 
only on whether or not X = 1. This yields 


P(all balls are red) = P(allred|X¥ = 1)P(X¥ = 1) + P(allred|X > 1)P(X > 1) 
1 -1 
—~—" = + Paallred|x > 1)7—— 
n+mn n 


Now, if X > 1 then in order for all balls in the sample to be red, the first one chosen 
must be red, which occurs with probability =74,, and then all of the ¥ — 1 remain- 


ing balls in the sample must be red. But given that the first ball chosen is red, the 
remaining X — 1 balls will be randomly selected from an urn containing n — 1 red 
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and m blue balls. As X — 1, given that X > 1, is equally likely to be any of the values 
1,...,1 — 1, it follows by the induction hypothesis that 


1 
P(all balls are red|X > 1) = u 
n+mm+1 
Thus, 
1 1 —1 
Pall balls are red) = t di 
n+m n+mm+t1ion 
ee ee 
“n+tm m+1 
1 


Example Suppose that X and Y are independent continuous random variables having densi- 
5n ties fy and fy, respectively. Compute P{X < Y}. 


Solution Conditioning on the value of Y yields 


PUR <¥)= [PX < YIY=yIfrordy 


-|/ PIX < ylY =y}fy Ody 


=) P{X < y}fy (y}dy _ by independence 


=) Fx(y) fy (y) dy 


where 


y 
Fx(y) =f fx (x) dx a 


Example Suppose that X and Y are independent continuous random variables. Find the dis- 
50 tribution function and the density function of X¥ + Y. 


Solution By conditioning on the value of Y, we obtain 


P{X +Y < a= P{X + Y < al¥ =y}fy() dy 


—0o 


=i P{X + y < alY=y}fy) dy 


—0o 


= [PX <a-vifrord 


= | Feta - »frordy 
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Differentiation yields the density function of X¥ + Y: 
d [o.e) 
feria) = [Feta — y)fy Ody 
4 J—co 


od 
= on Fy(a — y)fy (y)dy 
9 da 


i fla — y) fe (dy o 


7.5.4. Conditional Variance 


Just as we have defined the conditional expectation of X given the value of Y, we 
can also define the conditional variance of X given that Y = y: 


Var(X|Y) = E[(X — ELX|Y])|¥] 


That is, Var(X|Y) is equal to the (conditional) expected square of the difference 
between X and its (conditional) mean when the value of Y is given. In other words, 
Var(X|Y) is exactly analogous to the usual definition of variance, but now all expec- 
tations are conditional on the fact that Y is known. 

There is a very useful relationship between Var(X), the unconditional variance 
of X, and Var(X|Y), the conditional variance of X given Y, that can often be applied 
to compute Var(X). To obtain this relationship, note first that by the same reasoning 
that yields Var(X) = E[X?] — (E[X])*, we have 

Var(X|Y) = E[X?|Y] — (E[X|Y])? 
so 
E[Var(X|¥)] = ELELX?|¥]] — ELCELX1Y))"] 
= E[X*] — EU(ELXIY)"] (5.9) 
Also, since E[E_X|Y]] = ELX], we have 
Var(E[X|¥]) = E[(ELX|Y])*] — (ELX])? (5.10) 


Hence, by adding Equations (5.9) and (5.10), we arrive at the following proposition. 


Proposition The conditional variance formula 
5.2 
Var(X) = E[Var(X|Y)] + Var(E[X|Y]) 
Example Suppose that by any time ¢ the number of people who have arrived at a train depot 
5p is a Poisson random variable with mean At. If the initial train arrives at the depot at a 


time (independent of when the passengers arrive) that is uniformly distributed over 
(0, T), what are the mean and variance of the number of passengers who enter the 
train? 


Solution For each t = 0, let N(¢) denote the number of arrivals by ft, and let Y 
denote the time at which the train arrives. The random variable of interest is then 
N(Y). Conditioning on Y gives 
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EIN(Y)IY = #] = EIN@IY = 4] 
= E[N(d)] by the independence of Y and N(t) 


=it since N(ft) is Poisson with mean At 


Hence, 
E[N(Y)|Y] = ay 
so taking expectations gives 


ae 
2 


To obtain Var(N(Y)), we use the conditional variance formula: 


E[N(Y)] = AE[Y] 


Var(N(Y)|Y = t) = Var(N(|Y = 1) 
= Var(N(t)) by independence 
=At 
Thus, 


Var(N(Y)|Y) =AaAY 
E[N(Y)|Y] =aY 
Hence, from the conditional variance formula, 
Var(N(Y)) = E[AY] + Var(aY) 
a 92 T’ 
2 12 
where we have used the fact that Var(Y) = T? /12. B 


Example Variance of a sum of a random number of random variables 


5 
i Let X1, X2,... be a sequence of independent and identically distributed random vari- 


ables, and let N be a nonnegative integer-valued random variable that is independent 
N 

of the sequence X;,i = 1. To compute Var [ >~ X; ], we condition on N: 
i=] 


N 
E| 5° XN | = NE[X] 
i=1 


N 
Var | 5) X|N | = N Var(X) 
i=1 


The preceding result follows because, given N, oe 4 Xj is just the sum of a fixed 
number of independent random variables, so its expectation and variance are just 
the sums of the individual means and variances, respectively. Hence, from the condi- 
tional variance formula, 


N 
Var >: = E[N]Var(X) + (ELX])?Var(N) | 
i=1 
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7.6 Conditional Expectation and Prediction 


Sometimes a situation arises in which the value of a random variable X is observed 
and then, on the basis of the observed value, an attempt is made to predict the 
value of a second random variable Y. Let g(X) denote the predictor; that is, if X 
is observed to equal x, then g(x) is our prediction for the value of Y. Clearly, we 
would like to choose g so that g(X) tends to be close to Y. One possible criterion for 
closeness is to choose g so as to minimize E[(Y — g(X WF; We now show that, under 
this criterion, the best possible predictor of Y is g(X) = E[Y|X]. 


Proposition 
6.1 EU(Y — g(X))?] = EL(Y — ELYIX))"] 


Proof 
EU(Y — g(X))*|X] = EL(Y — E[Y|X] + E[Y|X] — gy 1X] 
= E[(Y — E[Y|X])|X] 
+ E[(E[Y|X] — g(X))?|X] 
+ 2E[(Y — E[Y|X)(E[Y1X] — 9(X))|X] (6.1) 


However, given X, E[Y|X] — g(X), being a function of X, can be treated as a con- 
stant. Thus, 


EU(Y — E[Y|X])(E[Y1X] — g(X))|X] 
= (E[Y|X] — gX)ELY — ELY|X]|X] 
= (E[Y|X] — g())(E[Y1X] — E[Y|X])) 
=0 (6.2) 
Hence, from Equations (6.1) and (6.2), we obtain 
E[(Y — g(X))°1X] = ELY — ELY1X)) 1X] 


and the desired result follows by taking expectations of both sides of the preceding 
expression. 


Remark A second, more intuitive, although less rigorous, argument verifying Propo- 
sition 6.1 is as follows: It is straightforward to verify that E[(Y — c)?] is minimized 
at c = E[Y]. (See Theoretical Exercise 1.) Thus, if we want to predict the value of 
Y when there are no data available to use, the best possible prediction, in the sense 
of minimizing the mean square error, is to predict that Y will equal its mean. How- 
ever, if the value of the random variable X is observed to be x, then the prediction 
problem remains exactly as in the previous (no-data) case, with the exception that all 
probabilities and expectations are now conditional on the event that XY = x. Hence, 
the best prediction in this situation is to predict that Y will equal its conditional 
expected value given that X = x, thus establishing Proposition 6.1. a 


Example Suppose that the son of a man of height x (in inches) attains a height that is normally 
6a distributed with mean x + 1 and variance 4. What is the best prediction of the height 
at full growth of the son of a man who is 6 feet tall? 


Solution Formally, this model can be written as 


Y=X+1+e 
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where e is anormal random variable, independent of X, having mean 0 and variance 
4. The X and Y, of course, represent the heights of the man and his son, respectively. 
The best prediction E[Y|X = 72] is thus equal to 


E[Y|X = 72] = E[X + 1 4+ e\X =72] 
=73 + Ele|X = 72] 
= 73 + E(e) by independence 
= 73 O 


Suppose that if a signal value s is sent from location A, then the signal value received 
at location B is normally distributed with parameters (s, 1). If S, the value of the 
signal sent at A, is normally distributed with parameters (1,07), what is the best 
estimate of the signal sent if R, the value received at B, is equal to r? 


Solution Let us start by computing the conditional density of S given R. We have 


_ fs,r(3,7) 
fsir(sir) = = it 


_ fs(s)fris(ls) 
FR 


= Ke~ 6-1)? /207 .—(r—-s"/2 


where K does not depend on s. Now, 


_ ay 2 
aa ed » =2(5 +3) (S+r)sta 


202 2 202 2 
1+ 2 2 pw + ro? 
=a Ss 2 eee ad a. 
1+ 0? (u + ro?) . 
~ 202 1+ 02 + 
where C; and C2 do not depend on s. Hence, 
2 
(u + ro”) 
1 + o2 
fsir(s|r) = C exp = 
oO 
2 


where C does not depend on s. Thus, we may conclude that the conditional distribu- 
tion of S, the signal sent, given that r is received, is normal with mean and variance 
now given by 


uw + ro? 
E{S|R = r] = ——— 
(SIR="]=7——3 

2 
Var(S|R =r) = 


1+ 02 
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Consequently, from Proposition 6.1, given that the value received is r, the best esti- 
mate, in the sense of minimizing the mean square error, for the signal sent is 


1 o 


E(S|R =r] = 
[S|R =r] tae Cae 


Writing the conditional mean as we did previously is informative, for it shows that it 
equals a weighted average of jw, the a priori expected value of the signal, and r, the 
value received. The relative weights given to yw and r are in the same proportion to 
each other as 1 (the conditional variance of the received signal when s is sent) is to 
o* (the variance of the signal to be sent). O 


In digital signal processing, raw continuous analog data X must be quantized, or 
discretized, in order to obtain a digital representation. In order to quantize the raw 


data X, an increasing set of numbers q;,i = 0,+1,+2,...,such that lim aj; = co and 
i— +00 
lim aj; = —ovcis fixed, and the raw data are then quantized according to the interval 
i—> —0o 


(aj, 4;1] in which X lies. Let us denote by y; the discretized value when X € (aj, 444], 
and let Y denote the observed discretized value—that is, 


Y=y; ifa; < X = ayy 
The distribution of Y is given by 
PLY = yi} = Fx @i41) — Fr(ai) 


Suppose now that we want to choose the values y;,i = 0,+1,+2,... so as to 
minimize E[(X — Y)?], the expected mean square difference between the raw data 
and their quantized version. 


(a) Find the optimal values y;,i =0,+1,.... 

For the optimal quantizer Y, show that 
(b) E[Y] = E[X], so the mean square error quantizer preserves the input mean; 
(c) Var(Y) = Var(X) — E[(X — Y)’]. 


Solution (a) For any quantizer Y, upon conditioning on the value of Y, we obtain 


E((X — YP]= So EX = yi)? lai < X S ays |Plaji < X = ajy1} 


Now, if we let 
T=i ifa < X = ay 
then 
EX — y)*lai < X = aii] = EX — y)* =i] 
and by Proposition 6.1, this quantity is minimized when 
yi = E[X| = 1] 
= E[X|aq, < X = aj4] 


={" xfx (x) dx 
Ja Fx (ix) — Fr(ai) 
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Now, since the optimal quantizer is given by Y = E[X|J], it follows that 


(b) E[Y] = E[X] 
©) Var(X) = E[Var(X|D] + Var(ELX|/]) 
= E[E[(X — Y)*|I]] + Var(¥) 


= E[(X — Y)*] + Var(Y) | 


It sometimes happens that the joint probability distribution of X and Y is not 
completely known; or if it is known, it is such that the calculation of E[Y|X = x] 
is mathematically intractable. If, however, the means and variances of X and Y and 
the correlation of X and Y are known, then we can at least determine the best linear 
predictor of Y with respect to X. 

To obtain the best linear predictor of Y with respect to X, we need to choose a 
and b so as to minimize E[(Y — (a + bX))*]. Now, 


E((Y — (a + bX))*] = E[Y? — 2aY — 2bXY + a® + 2abX + b*X?] 
= E[Y?] — 2aE[Y] — 2bE[XY] + @ 
+ 2abE[X] + b*E[X?] 


Taking partial derivatives, we obtain 


2 a — a — bX)*] =-2E[Y] + 2a + 2bE[X] 
. (6.3) 
ap EY — a — bX)*] = —-2E[XY] + 2aE[X] + 2bE[X?7] 


Setting Equations (6.3) to 0 and solving for a and b yields the solutions 


, — ELXY] - ELXIELY] _ CovixX,¥) _ 9 
~ EL] — (EXP? CSC teh 
_ poyE[X] 


Ox 


where p = Correlation(X, Y)63 = Var(Y), and a2 = Var(X). It is easy to ver- 
ify that the values of a and b from Equation (6.4) minimize E[(Y — a — bX)’]; 
thus, the best (in the sense of mean square error) linear predictor Y with respect 
to X is 


po 
My + 2X — fy) 
Ox 
where fy = E[Y] and py = ELX]. 

The mean square error of this predictor is given by 


E|(¥ = wy - Lex us) 


(oy 


2 
oO 
= ELOY = wy] + oP SELX ~ way?] = 20 TE[Y — my — wx)] 


— o + ao, - 2p a, 
=o;(1 — p”) (6.5) 
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We note from Equation (6.5) that if p is near +1 or —1, then the mean square error 
of the best linear predictor is near zero. a 


An example in which the conditional expectation of Y given X is linear in X, and 
hence in which the best linear predictor of Y with respect to X is the best overall 
predictor, is when X and Y have a bivariate normal distribution. For, as shown in 
Example 5d of Chapter 6, in that case, 


Oy 
E[Y|X =x] = py + em od — px) a 
x 


Moment Generating Functions 


The moment generating function M(t) of the random variable X is defined for all 
real values of t by 


M(t) = Efe*] 
> ep (x) if X is discrete with mass function p(x) 
x. 
[oe] 
/ e“f(x)dx if X is continuous with density f(x) 
—Co 


We call M(t) the moment generating function because all of the moments of X can 
be obtained by successively differentiating M(t) and then evaluating the result at 
t = 0. For example, 


M(t) = © Ble 
=£ aa (71) 
= E[Xe*] 


where we have assumed that the interchange of the differentiation and expectation 
operators is legitimate. That is, we have assumed that 


d x d di 
on ) e“p(x) | = ) =e Pix)| 
dt : 7 dt 

in the discrete case and 


d d 
£| fererar] = f Ftervooqax 


in the continuous case. This assumption can almost always be justified and, indeed, is 
valid for all of the distributions considered in this book. Hence, from Equation (71), 
evaluated at t = 0, we obtain 


M'(0) = E[X] 
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Similarly, 
d 
M(t) = —M'(t 
=e 
d 
= —E[Xe'* 
qe | 
d 
= E| —(Xe'* 
ne 
= E(x] 
Thus, 


M" (0) = E[X?] 
In general, the nth derivative of M() is given by 
M"(t) = E[X"e*] n=1 


implying that 
M"(0) = E[X"] n=1 


We now compute M(t) for some common distributions. 


Example Binomial distribution with parameters n and p 
i If X is a binomial random variable with parameters n and p, then 


M(t) = Efe*] 


= (pel + 1 — p)” 
where the last equality follows from the binomial theorem. Differentiation yields 
M'(t) =n(pe’ + 1 — p)""|pe 


Thus, 
E[X] = M'(0) = np 


Differentiating a second time yields 
M"(0) = n(n — (pe! + 1 — p)" (pel? + n(pe! + 1 — p)"|pe! 


so 
E[X?] = M"(0) = n(n — 1)p* + np 


The variance of X is given by 
Var(X) = E[X*] — (E[X])° 
=n(n — 1)p? + np — np* 
=np(l — p) 


verifying the result obtained previously. a 
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Example Poisson distribution with mean A 


a” If X is a Poisson random variable with parameter A, then 
M(t) = Efe*] 
pe elle Ayn 
~ n! 
n=0 
oe tyn 
ayo ae) 
= 2 n!} 
n=0 

_ eee 

= exp{a(e’ — 1} 
Differentiation yields 

M'(t) = Ae’ exp{a(e’ — 1)} 
M(t) = (ae) exp{a(e’ — 1)} + Ae’ exp{a(e’ — 1)} 
Thus, 
E[X] = M'(0) =a 
E[X?] = M"0) =27 +4 
Var(X) = E[X*] — (E[X])? 

=i 

Hence, both the mean and the variance of the Poisson random variable equal’. 
Example Exponential distribution with parameter i 
Tc 


M(t) = Efe*] 


[o.e) 
-|/ ere dx 
0 


[o-<) 
= if ee A—DX dy 
0 
x 


= —— fort <A 
X-t 
We note from this derivation that for the exponential distribution, M(t) is defined 
only for values of ¢ less than A. Differentiation of M(d) yields 


2X 


M(t) = G=n 


aco = 


Hence, 


1 5 
E[X] = M'(0) = = E[X?] = M’(0) = a 
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The variance of X is given by 


Var(X) = E[X?] — (E[X])* 
1 


Normal distribution 
We first compute the moment generating function of a standard normal random vari- 
able with parameters 0 and 1. Letting Z be such a random variable, we have 
Mz(t) = Efe] 
1 o tx ,—x?/2 
= — ee dx 
V2 i. 
1 i we ae (x2 — 2tx) r 
= wae =— =.= X 
V 2m J—oo P 2 


il oo @=p 
=— exp } —-——~——. + =} dx 
J 20 i P| 2 2. 


2/2 1 7 —(x—1)?/2 
=e /*—— e © dx 
VV 20 —co 


2 
=P 


Hence, the moment generating function of the standard normal random variable Z 
is given by Mz(t) = e/2. To obtain the moment generating function of an arbitrary 
normal random variable, we recall (see Section 5.4) that ¥ = yp + oZ will have 
a normal distribution with parameters and o? whenever Z is a standard normal 
random variable. Hence, the moment generating function of such a random variable 
is given by 
Mx = Ele™] 

= Efelte4] 

= Efe“ e'?7] 

— el Efel?Z] 

_ Mz (to) 


= giitelte)’/2 
ae 
= —— t 
eB) 5 + be 
By differentiating, we obtain 
1 2 or 
x(t) = (u + to”) exp a + pt 


" 59 ak 2 ot? 
My(t) = (uw + to“) exp 9 ee +o exp 3 Fw 
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Table 7.1 Discrete Probability Distribution. 
Moment 

Probability mass generating 

function, p(x) function, M(t) Mean Variance 
Binomial with (") p*( — p)"* (pe’ +1 — p)" np np(l — p) 
parameters n, p; 
0<pesl1 4=0,1).0.,7 

A* 

Poisson with ir exp{A(e’ — 1)} nN ry 
parameter 2 > 0 - 

x= 01,2... 

y il 1- 
Geometric with pl, =p! is ; 5 = 
parameter 1- (1 — pie P B 
0=p=1 Bi Lae 
1 ée "oF r(l — p) 
Negative é _ t)pra — pyr : ] 5 2 
binomial with 1- (1 — pie P P 
parameters r, p; 
0O=p=1 n=r,r+1,... 
Thus, 
E[X] = M'(0) =u 
implying that 
Var(X) = E[X?] — E((x])* 
=o Oo 


Tables 7.1 and 72 (on page 364) give the moment generating functions for some 
common discrete and continuous distributions. 

An important property of moment generating functions is that the moment gen- 
erating function of the sum of independent random variables equals the product of 
the individual moment generating functions. To prove this, suppose that X and Y are 
independent and have moment generating functions My(t) and My(d), respectively. 
Then My,y(f), the moment generating function of X + Y, is given by 


Mx+y() = Efe’) 
= Efe*e!™] 
= Efe JE[e™] 
= Mx()My(t) 
where the next-to-last equality follows from Proposition 4.1, since X and Y are 
independent. 
Another important result is that the moment generating function uniquely deter- 


mines the distribution. That is, if My (‘) exists and is finite in some region about t = 0, 
then the distribution of X is uniquely determined. For instance, if 
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72 nt 
zt x 
AY AY 
zt Bd 
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A ~ q) q + D 
JOURLIvA, ues 


1-% 
v 

(v — qy 

m1? — qi? 


(2) W “uornouny 
suljeloues 
JUSTO 


Ouz 

wo>x> wo- z2t/_(t-2)-? = (xf 

0O>* 0 
jae (S) J = (Xf 

eS aay 

0>x Ol _ 

perl Oe! 

dSIMIOYIO 0 
p—qft=HHs 


q>x>vD 


if 


(x) f ‘uonounj Aysuap AjyIqeqoig 


(,9°/) 
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siojoweied ji BUTTE) 
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1 10 
Mx(t) = (5) eb 1", 


then it follows from Table 7.1 that X is a binomial random variable with parameters 
10 and 5. 
2 


Suppose that the moment generating function of a random variable X is given by 
M(t) = &-), What is P(X = 0}? 


Solution We see from Table 71 that M(t) = e—" is the moment generating func- 
tion of a Poisson random variable with mean 3. Hence, by the one-to-one correspon- 
dence between moment generating functions and distribution functions, it follows 
that X must be a Poisson random variable with mean 3. Thus, P(X =0}=e>. 


Sums of independent binomial random variables 
If X and Y are independent binomial random variables with parameters (n, p) and 
(m, p), tespectively, what is the distribution of X¥ + Y? 


Solution The moment generating function of X + Y is given by 


Mx+y(t) = Mx(QMy( = (pe’ + 1 — p)"(pe’ + 1 — py” 
= (pe! oe 1 _ per 


However, (pe’ + 1 — p)’"*" is the moment generating function of a binomial ran- 
dom variable having parameters m + n and p. Thus, this must be the distribution 
of X + Y. a 


Sums of independent Poisson random variables 
Calculate the distribution of X + Y when_X and Y are independent Poisson random 


variables with means respectively 4; and A. 


Solution 


My+y() = My()My() 
= exp{Ai(e" — 1)} exp{r2(e’ — 1} 
= exp{(Ay + A2)(e! — 1)} 


Hence, X¥ + Y is Poisson distributed with mean A, + Az, verifying the result given 
in Example 3e of Chapter 6. a 


Sums of independent normal random variables 


Show that if X and Y are independent normal random variables with respective 
parameters (Gi1,07) and (142,05), then X + Y is normal with mean yw, + 2 and 
variance a; + ae. 


Example 
Ti 


Example 
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Solution 


Myiy() = MyOMy(0) 
_ are ger 
=expy—— + maitpexpy—s— + Hal 


2 2y 42 
= exp | sisce +) 


5 + (uy + won| 


which is the moment generating function of a normal random variable with mean 
[41 + 2 and variance an + da. The desired result then follows because the moment 
generating function uniquely determines the distribution. a 


Compute the moment generating function of a chi-squared random variable with n 
degrees of freedom. 


Solution We can represent such a random variable as 
B+. +Z 


where Z,...,Z, are independent standard normal random variables. Let M(t) be 
its moment generating function. Then, by the preceding, 


M(t) = (Ele ])" 


where Z is a standard normal random variable. Now, 


tZ 1 - tx? —x?/2 
Ele“ |= om ee *l* dx 
=o0) 


= a ie e* 120" dy where o? = (= 277 
Vv 2m J—co 
=o 
=(1 -— 27? 
where the next-to-last equality uses the fact that the normal density with mean 0 and 
variance o? integrates to 1. Therefore, 


MQ) =(1 — 207" | 


Moment generating function of the sum of a random number of random variables 


Let X1, X2,... be a sequence of independent and identically distributed random 
variables, and let N be a nonnegative, integer-valued random variable that is inde- 
pendent of the sequence X,i = 1. We want to compute the moment generating 


function of 
N 
ray x 
i=1 


(In Example 5d, Y was interpreted as the amount of money spent in a store on a 
given day when both the amount spent by a customer and the number of customers 
are random variables.) 
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To compute the moment generating function of Y, we first condition on N as 


follows: 
N n 
E | exp t> > Xx; N=n|=E| exp ee N=n 
1 1 
n 
= E'| exp t) > Xi 
1 
= [Mx(o]" 
where 
Mx(t) = Efe*'] 
Hence, 
Efe’ |N] = (Mx(0)% 
Thus, 


My() = E[(Mx(0)"] 
The moments of Y can now be obtained upon differentiation, as follows: 
My) = E[N(Mx()" My 0) 
So 
E[Y] = My (0) 
= E[N(Mx(0))""* M4, 0)] 
= E[NE[X]] (72) 
= E[N]E[X] 
verifying the result of Example Sd. (In this last set of equalities, we have used the 


fact that My(0) = Efe°*] = 1.) 
Also, 


My(t) = E[N(N — 1)(Mx() (My (0)? + NM x) MZ] 
so 

E[Y?] = My(0) 
= E[N(N — 1)(E[X])? + NE[X’]] 
= (E[X])?(E[N*] — E[N]) + E[NJE[X7] (73) 
= E[N\ELX?] — (ELX)*) + (ELX)) EDN] 
= E[N]Var(X) + (E[X])?E[N?] 

Hence, from Equations (72) and (73), we have 

Var(Y) = E[N]Var(X) + (E[X])?(E[N?] — (E[N])”) 

= E[N]Var(X) + (ELX])*Var(N) | 


Example 
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Let Y denote a uniform random variable on (0, 1), and suppose that conditional on 
Y = p, the random variable X has a binomial distribution with parameters n and p. 
In Example 5k, we showed that X is equally likely to take on any of the values 
0,1,...,”. Establish this result by using moment generating functions. 


Solution To compute the moment generating function of X, start by conditioning 
on the value of Y. Using the formula for the binomial moment generating function 
gives 


E[e'*|Y = p] = (pel + 1 — p)” 


Now, Y is uniform on (0, 1), so, upon taking expectations, we obtain 


1 
E[e'*] = (pe’ + 1 — p)" dp 
0 


1 e@ 
er eae | / y"dy (by the substitution y = pe’ + 1 — p) 
e7 4/1 


1 ela+1) —1 


n+1 eé-1 
1 
= il t 2t — nt 

rar (se ape” a eee) 
Because the preceding is the moment generating function of a random variable that 
is equally likely to be any of the values 0,1,...,7, the desired result follows from the 
fact that the moment generating function of a random variable uniquely determines 
its distribution. a 


7.7.1 Joint Moment Generating Functions 


It is also possible to define the joint moment generating function of two or more 


random variables. This is done as follows: For any n random variables X1,...,Xn, 
the joint moment generating function, M(t,...,t,), is defined, for all real values of 
ti,...,ln, by 


M(t,...,tn) = Efeti*it- +inXny 


The individual moment generating functions can be obtained from M(t,...,tn) by 
letting all but one of the ¢;’s be 0. That is, 


My,(t) = Efe*'] = M@,...,0, t, 0,...,0) 


where the f is in the ith place. 

It can be proven (although the proof is too advanced for this text) that the joint 
moment generating function M(t,,...,¢,) uniquely determines the joint distribution 
of X1,...,Xn. This result can then be used to prove that the m random variables 
X1,...,X» are independent if and only if 


M(th,...,tn) = Mx, (th) --- Mx, (tn) (74) 
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For the proof in one direction, if the m random variables are independent, then 


M(th,...5tn) = Eleaxtt +X] 
=> E[ei* aig elnXn] 
=F ie) a - Efen*"] by independence 
= Mx, (t)--:Mx, tn) 


For the proof in the other direction, if Equation (74) is satisfied, then the joint 
moment generating function M(t,,...,t,) is the same as the joint moment generating 
function of n independent random variables, the ith of which has the same distribu- 
tion as X;. As the joint moment generating function uniquely determines the joint 
distribution, this must be the joint distribution; hence, the random variables are 
independent. 


Let X and Y be independent normal random variables, each with mean wp and vari- 
ance o”. In Example 7a of Chapter 6, we showed that X + Y and X — Y are 
independent. Let us now establish that X + Y and X — Y are independent by 
computing their joint moment generating function: 


E[ee ty se-Y)) _ Ele O29) 


= Efe" *E[e =] 


= el(tts)4 0 (t+s)"/2 ,u(t s)+o7(t—s)? /2 


But we recognize the preceding as the joint moment generating function of the sum 
of a normal random variable with mean 2 and variance 207 and an independent 
normal random variable with mean 0 and variance 207. Because the joint moment 
generating function uniquely determines the joint distribution, it follows that X + Y 
and X — Y are independent normal random variables. a 


In the next example, we use the joint moment generating function to verify a 
result that was established in Example 2b of Chapter 6. 


Suppose that the number of events that occur is a Poisson random variable with 
mean A and that each event is independently counted with probability p. Show that 
the number of counted events and the number of uncounted events are independent 
Poisson random variables with respective means Ap and A(1 — p). 


Solution Let _X denote the total number of events, and let X,. denote the number of 
them that are counted. To compute the joint moment generating function of X;, the 
number of events that are counted, and Y — X;, the number that are uncounted, 
start by conditioning on X to obtain 


ElesXet(X-Xe) [x 2 n| = el” E[e®-9Xc |X = n] 


= e" (pest 4 1 _ p)" 
= (pe + (1 — p)e'y” 
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which follows because, conditional on X = n, X; is a binomial random variable with 
parameters n and p. Hence, 


EletXettX—Xe) x] -_ (pe + (a _ pye)* 
Taking expectations of both sides of this equation yields 
EleXet!X-X0)] = Ef(pe’ + (1 — pye’y*] 


Now, since X is Poisson with mean A, it follows that Efex = eel) Therefore, for 


any positive value a we see (by letting a = e') that E[a*] = e“—). Thus, 
ElesXettX-X0)] — eh(pe+(—pye'-1) 


S— e! — 
— Pe!) d-py(el-)) 


As the preceding is the joint moment generating function of independent Poisson 
random variables with respective means Ap and A(1 — p), the resultis proven. MH 


7.8 Additional Properties of Normal Random Variables 
7.8.1 The Multivariate Normal Distribution 


Let Z1,...,Z, be a set of n independent standard normal random variables. If, for 
some constants aj,1 Sis m,1 Sj Sn,andyj,lSism, 


X= ayZ, + ++ + anZn + oy 
X_ = ay1Z, + +++ + AanZn + be 


Xj = ayZy + +++ + GinZn + Mi 


Xm = Ami Z, + +++ + AmnZn + Lm 


then the random variables Xj,..., Xj), are said to have a multivariate normal distri- 
bution. 

From the fact that the sum of independent normal random variables is itself a 
normal random variable, it follows that each X; is a normal random variable with 
mean and variance given, respectively, by 


E[Xi] = Hi 


n 
Var(X;) = > ai; 
j=l 


Let us now consider 


M(t,...,tm) = Elexp{t_X1 + --- + tnXm}] 
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Example 
8a 


the joint moment generating function of X1,..., Xm. The first thing to note is that 
m 
since )° ¢,Xj is itself a linear combination of the independent normal random vari- 


i=1 
ables Z1,..., Zn, it is also normally distributed. Its mean and variance are 


and 


m m m 
Var So Xj = Cov S> Xi, ¥- 5X; 
i=l i=1 j=l 


m m 


= > eS titi Cov(Xj, Xj) 


i=1 j=1 
Now, if Y is a normal random variable with mean jz and variance o”, then 


EleY] = My(\a1 = et 7°? 


Thus, 
m 1 m m 
M(t, ..-,m) = exp x imi + 5 2s s tit; Cov(Xi, X}) 
i=1 i=1 j=1 
which shows that the joint distribution of X1,..., Xm is completely determined from 


a knowledge of the values of ELX;] and Cov(Xj, Xj), i,j = 1,...,m. 
It can be shown that when m = 2, the multivariate normal distribution reduces 
to the bivariate normal. 


Find P(X < Y) for bivariate normal random variables X and Y having parameters 
jx = E[X], wy = E[Y], of = Var(X), of = Var(Y), p = Corr(X,Y) 
Solution Because X — Y is normal with mean 
E[X — Y]= wx - py 
and variance 
Var(X — Y) = Var(X) + Var(—Y) + 2Cov (x, —Y) 
=o? + Oo, — 2poxoy 
we obtain 


P(X < Y}=P{X - Y < 0} 


_ Ae = ig = Hy) 2 (xy — Hy) 
(eo + oy — 2poxoy Joe + ay — 2poxoy 
My — Mx 


= 
Je + oy — 2poxoy 


Example 
8b 
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Suppose that the conditional distribution of X, given that © = 6, is normal with 
mean @ and variance 1. Moreover, suppose that © itself is a normal random vari- 
able with mean jz and variance o”. Find the conditional distribution of © given that 
X= Hi 


Solution Rather than using and then simplifying Bayes’s formula, we will solve this 
problem by first showing that_X, © has a bivariate normal distribution. To do so, note 
that the joint density function of X, © can be written as 


fx,o(x, 9) = fxjo@l@) fo) 


where fy\@(x|@) is anormal density with mean 6 and variance 1. However, if we let Z 
be a standard normal random variable that is independent of ©, then the conditional 
distribution of Z + ©, given that © = 9, is also normal with mean 6 and variance 1. 
Consequently, the joint density of Z + ©, © is the same as that of X, ©. Because the 
former joint density is clearly bivariate normal (since Z + © and © are both linear 
combinations of the independent normal random variables Z and 0), it follows that 
X,@ has a bivariate normal distribution. Now, 


E[X] = £[Z + @] =u 
Var(X) = Var(Z + @) =1 + o” 


and 


p = Corr(X, ©) 
= Corr(Z + ©,@) 
_ Cov(Z + ©,©) 
~ /Var(Z + ©)Var(©) 
oO 


v1 + 02 


Because X,© has a bivariate normal distribution, the conditional distribution of ©, 
given that X = x, is normal with mean 


Var(@ 
E[OIX =x] = E[] + p eee (« — ELX) 


and variance 


Var(@|X = x) = Var(®)(1 — p*) 


2 
oO 
cs | 
1 + 02 
7.8.2 The Joint Distribution of the Sample Mean and Sample 
Variance 
Let X,,...,X);, be independent normal random variables, each with mean py and vari- 


= n 
ance o*. Let X = )> X;/n denote their sample mean. Since the sum of independent 
i=1 
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normal random variables is also a normal random variable, it follows that X is a nor- 
mal random variable with (from Examples 2c and 4a) expected value jw and variance 
2 
o-/n. 
Now, recall from Example 4e that 


Cov(X,X; — X)=0, i=1,...,n (8.1) 


Also, note that since X,X, — X,X> — X,...,X, — X are all linear combina- 
tions of the independent standard normals (X; — jw)/o,i = 1,...,n, it follows that 
X,X; — X,i=1,...,nhasa joint distribution that is multivariate normal. If we let 
Y be a normal random variable, with mean ju and variance o? /n, that is independent 
of the X;,i = 1,...,n, then Y,X; — X,i = 1,...,n also has a multivariate normal 
distribution and, indeed, because of Equation (8.1), has the same expected values 
and covariances as the random variables X,X; — X,i = 1,...,n. But since a mul- 
tivariate normal distribution is determined completely by its expected values and 
covariances, it follows that Y,X; — X,i=1,...,nand X,X; — X,i=1,...,n have 
the same joint distribution, thus showing that X is independent of the sequence of 
deviations X; — X,i=1,...,n. 
Since X is independent of the sequence of deviations X; — X,i =1,...,n, itis 
n 
also independent of the sample variance S= ie bau (n — 1). 
i=1 
Since we already know that X is normal with mean p and variance o7/n, it 
remains only to determine the distribution of S?. To accomplish this, recall, from 
Example 4a, the algebraic identity 


(n — 1S? = Ye =X) 
i=1 


= DIM -— we? — n(® — pw)? 
i=1 


Upon dividing the preceding equation by o7, we obtain 
= 2 
(Q—-DS | (X-n\ _ A(X -2y 
= 8.2 

o2 = a//n dX o e7) 
Now, 
Sees) 

oO 

i=1 


is the sum of the squares of m independent standard normal random variables and 
so is a chi-squared random variable with n degrees of freedom. Hence, from Exam- 
ple 7i, its moment generating function is (1 — 2)~”/*. Also, because 


= 2 
A = fii 
o//n 
is the square of a standard normal variable, it is a chi-squared random variable with 


1 degree of freedom, and so has moment generating function (1 — 21)~!/?. Now, we 
have seen previously that the two random variables on the left side of Equation (8.2) 
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are independent. Hence, as the moment generating function of the sum of indepen- 
dent random variables is equal to the product of their individual moment generating 
functions, we have 


Elel-DS*/0* 14 _ an = dd _ 2r)—"/? 


or 
Ele -Y8*/0") _ al - J PVE 


But as (1 — 2¢)~-“~)/? is the moment generating function of a chi-squared random 
variable with n — 1 degrees of freedom, we can conclude, since the moment gener- 
ating function uniquely determines the distribution of the random variable, that that 
is the distribution of (n — 1)S?/o?. 

Summing up, we have shown the following. 


Proposition If X1,...,X, are independent and identically distributed normal random variables 
8.1 with mean yw and variance o7, then the sample mean X and the sample variance S? 
are independent. X is a normal random variable with mean ju and variance o /Nn; 

(n — 1)S?/o? is a chi-squared random variable with n — 1 degrees of freedom. 


7.9 General Definition of Expectation 


Up to this point, we have defined expectations only for discrete and continuous ran- 
dom variables. However, there also exist random variables that are neither discrete 
nor continuous, and they, too, may possess an expectation. As an example of such a 
random variable, let X be a Bernoulli random variable with parameter p = 7. and let 
Y be a uniformly distributed random variable over the interval [0, 1]. Furthermore, 
suppose that XY and Y are independent, and define the new random variable W by 


xX #X=S=1 
ieee {? ifX #1 


Clearly, W is neither a discrete (since its set of possible values, [0, 1], is uncountable) 
nor a continuous (since P{W = 1} = 5) random variable. 

In order to define the expectation of an arbitrary random variable, we require 
the notion of a Stieltjes integral. Before defining this integral, let us recall that for 
any function g, Uy g(x) dx is defined by 


b n 
J seo de=tim Y gteney ~ 21-0 


i=1 


where the limit is taken over alla = xg < x1 < xX2°:- < X, =basn—oand where 


For any distribution function F, we define the Stieltjes integral of the nonnega- 
tive function g over the interval [a, b] by 


b n 
i g(x) dF(x) = lim) g@) [Fad — Foi] 
a i=1 
where, as before, the limit is taken over alla = x9 < x1 < ++: < X,» = basn—-ooand 


$5céa 


real line by 
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oe) b 
/ g(x) dF(x)= lim / g(x) dF(x) 
0° a> —o Ja 


b—> +00 


Finally, if g is not a nonnegative function, we define gt and g~ by 


_|s@  ifg(~) 20 
gh (x) = | 0 ifg(x) < 0 


ae 0 ifg(~) =0 
is Le if g(x) < 0 
Because g(x) = g*(x) — g(x) and gt and g” are both nonnegative functions, it is 
natural to define 


/ g(x) dF(x) = / gt (x) dF(x) — il g (x) dF(x) 
and we say that {°° g(x) dF (x) exists as long as f°. g* (x) dF(x) and [°. g~ (x) dF(x) 
are not both equal to +o. 
If _X is an arbitrary random variable having cumulative distribution F, we define 
the expected value of X by e 
E[X] = / x dF (x) (9.1) 


—co 


It can be shown that if X is a discrete random variable with mass function p(x), then 


is xdF(x) = > xp (x) 


ee x:p(x)>0 


whereas if X is a continuous random variable with density function f(x), then 


E xdF(x) = a xf (x) dx 


The reader should note that Equation (9.1) yields an intuitive definition of ELX]; 
consider the approximating sum 


Yo xlFoa) — F@-1)] 


=i 


of ELX]. Because F(x;) — F(x;_1) is just the probability that X will be in the interval 
(x;_1,x;], the approximating sum multiplies the approximate value of X when it is in 
the interval (x;_1, xi] by the probability that it will be in that interval and then sums 
over all the intervals. Clearly, as these intervals get smaller and smaller in length, we 
obtain the “expected value” of X. 

Stieltjes integrals are mainly of theoretical interest because they yield a compact 
way of defining and dealing with the properties of expectation. For instance, the 
use of Stieltjes integrals avoids the necessity of having to give separate statements 
and proofs of theorems for the continuous and the discrete cases. However, their 
properties are very much the same as those of ordinary integrals, and all of the proofs 
presented in this chapter can easily be translated into proofs in the general case. 


Summary 


If X and Y have a joint probability mass function p(x, y), 
then 
Elg(X,Y)] = D> >) gy). y) 
y x 


whereas if they have a joint density function f(x, y), then 


E[g(X, Y)] = i: / a(x, y)f (x,y) dx dy 


A consequence of the preceding equations is that 
E[|X + Y]=E[X] + E[Y] 


which generalizes to 


The covariance between random variables X and Y is 
given by 
Cov(X, ¥) = E[(X — E[X](¥ — E[Y)] 
= E[XY] — E[X]JE[Y] 
A useful identity is 


m n m 


n 
Cov} )> Xi, ¥) | = > D> Cov(X;, ¥/) 
i=1 J 


When 1 = mand Y; = Xj,i = 1,...,n, the preceding 
formula gives 


n n 
Var pe.< = >. Var(X;) + 2 > 2 Cov(Xi, Yj) 
i=1 i=1 


i<j 
The correlation between X and Y, denoted by p(X, Y), is 
defined by 
Cov(Xx, Y) 
VVar(X)Var(Y) 
If X and Y are jointly discrete random variables, then the 


conditional expected value of X, given that Y = y, is 
defined by 


(X,Y) = 


E[X|Y = y] =) )xP(X = x1¥ = y] 
x 
If X and Y are jointly continuous random variables, then 


E[X|Y = y]= / xfxyy ly) 


f(x,y) 
fy) 


where 


fxyy Gly) = 
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is the conditional probability density of X given that 
Y = y. Conditional expectations, which are similar to 
ordinary expectations except that all probabilities are now 
computed conditional on the event that Y = y, satisfy all 
the properties of ordinary expectations. 

Let ELX|Y] denote that function of Y whose value at 
Y =yis E[X|Y = y]. A very useful identity is 


E|X] = E[E[X|Y]] 


In the case of discrete random variables, this equation 
reduces to the identity 


E[X] = )° E[X|Y = y]PtY =y} 


y 


and, in the continuous case, to 


CO 
ELX]= [ ELXIY = ylfvody 
—oo 
The preceding equations can often be applied to obtain 
E|[X] by first “conditioning” on the value of some other 
random variable Y. In addition, since, for any event A, 
P(A) = E{L4], where I, is 1 if A occurs and is 0 otherwise, 
we can use the same equations to compute probabilities. 

The conditional variance of X, given that Y = y, is 
defined by 


Var(X|Y = y) = E[(X — E[X|Y = y])7|¥ =y] 


Let Var(X|Y) be that function of Y whose value at Y = y 
is Var(X|Y = y). The following is known as the conditional 
variance formula: 


Var(X) = E[Var(X|Y)] + Var(ELX|Y]) 


Suppose that the random variable X is to be observed and, 
on the basis of its value, one must then predict the value of 
the random variable Y. In such a situation, it turns out that 
among all predictors, E[ Y|X] has the smallest expectation 
of the square of the difference between it and Y. 

The moment generating function of the random vari- 
able X is defined by 


M(t) = Efe*] 


The moments of X can be obtained by successively differ- 
entiating M(t) and then evaluating the resulting quantity 
at t = 0. Specifically, we have 


n=1,2,... 
t=0 


nl d" 
E[X"] = MO 


Two useful results concerning moment generating func- 
tions are, first, that the moment generating function 
uniquely determines the distribution function of the 
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random variable and, second, that the moment generat- 
ing function of the sum of independent random variables 
is equal to the product of their moment generating func- 
tions. These results lead to simple proofs that the sum of 
independent normal (Poisson, gamma) random variables 
remains a normal (Poisson, gamma) random variable. 

If X,,...,Xm are all linear combinations of a finite 
set of independent standard normal random variables, 
then they are said to have a multivariate normal distribu- 
tion. Their joint distribution is specified by the values of 
E[Xj], Cov(X;, Xj), i,j = 1,...,m. 

If X1,...,X, are independent and identically dis- 
tributed normal random variables, then their sample mean 


Problems 


71. A player throws a fair die and simultaneously flips a 
fair coin. If the coin lands heads, then she wins twice, and 
if tails, then she wins one-half of the value that appears on 
the die. Determine her expected winnings. 


7.2. The game of Clue involves 6 suspects, 6 weapons, and 
9 rooms. One of each is randomly chosen and the object of 
the game is to guess the chosen three. 

(a) How many solutions are possible? 

In one version of the game, the selection is made and then 
each of the players is randomly given three of the remain- 
ing cards. Let S, W, and R be, respectively, the numbers 
of suspects, weapons, and rooms in the set of three cards 
given to a specified player. Also, let X denote the number 
of solutions that are possible after that player observes his 
or her three cards. 


(b) Express X in terms of S, W, and R. 

(c) Find EX]. 

7.3. Daily price movements of an asset are independent, 
and it is twice as likely for the price to go up than down. 
Let D represent the difference between the up and down 
movements of an asset that is sold on its first downward 
movement. Find 

(a) P{D < 0} 

(b) P{D = 1} 

(c) E[D] 


7.4. If X and Y have joint density function 


‘ 
3a + y), O<x<1,0<y<2 
a Pg 
fy, y@y) = | 0, otherwise 
find 
(a) E[XY] 
(b) E[X] 


(c) E[Y?] 


are independent. The sample mean_X is a normal random 
variable with mean jy and variance o2/n; the random vari- 
able (n — 1)S*/o? is a chi-squared random variable with 
n — 1 degrees of freedom. 


7.5. A city in the shape of a rectangle stretches 5 kilo- 
meters from west to east and 3 kilometers from north to 
south. A rescue helicopter waits in a helipad just outside 
the city near the south-western corner, with coordinates 
(0,0). A rescue call, which follows a uniform distribution, 
can arrive at any point (x, y) in the city. Find the expected 
distance covered by the helicopter in travelling to this 
point. 


7.6. A fair die is rolled 10 times. Calculate the expected 
sum of the 10 rolls. 


7.7. Eight medical tests are run independently by 2 sepa- 
rate labs, with probability of a correct result equal to .95. 
Find the expected number of times that 


(a) both labs provide a correct result; 
(b) both labs provide a wrong result; 
(c) only 1 lab provides a correct result. 


7.8. N people arrive separately to a professional dinner. 
Upon arrival, each person looks to see if he or she has 
any friends among those present. That person then sits 
either at the table of a friend or at an unoccupied table 
if none of those present is a friend. Assuming that each 


of the 4 pairs of people is, independently, a pair of 


friends with probability p, find the expected number of 
occupied tables. 

Hint: Let X; equal 1 or 0, depending on whether the ith 
arrival sits at a previously unoccupied table. 


7.9. A total of n balls, numbered 1 through n, are put into 
n urns, also numbered 1 through n in such a way that ball 7 
is equally likely to go into any of the urns 1,2,...,7. Find 
(a) the expected number of urns that are empty; 

(b) the probability that none of the urns is empty. 


7.10. Consider 3 trials, each having the same probability 
of success. Let X denote the total number of successes in 
these trials. If ELX] = 1.8, what is 


(a) the largest possible value of P(X = 3}? 
(b) the smallest possible value of P(X = 3}? 


In both cases, construct a probability scenario that results 
in P{X = 3} having the stated value. 

Hint: For part (b), you might start by letting U be a uni- 
form random variable on (0, 1) and then defining the trials 
in terms of the value of U. 


711. Consider n independent flips of a coin having proba- 
bility p of landing on heads. Say that a changeover occurs 
whenever an outcome differs from the one preceding it. 
For instance, if nm = 5 and the outcome is HHTHT, then 
there are 3 changeovers. Find the expected number of 
changeovers. 

Hint: Express the number of changeovers as the sum of 
n — 1 Bernoulli random variables. 


7.12. A group of m men and n women is lined up at 
random. 

(a) Find the expected number of men who have a woman 
next to them. 


(b) Repeat part (a), but now assuming that the group is 
randomly seated at a round table. 


7.13. On a leap year, each of 366 people with different 
birthdays pick one of 366 travel tickets for different days of 
the year. If those who pick tickets corresponding to their 
birthdays can avail free travel, find the expected number 
of people who cannot. 


7.14. An urn has m black balls. At each stage, a black ball 
is removed and a new ball that is black with probability p 
and white with probability 1 — p is put in its place. Find 
the expected number of stages needed until there are no 
more black balls in the urn. 

NOTE: The preceding has possible applications to under- 
standing the AIDS disease. Part of the body’s immune 
system consists of a certain class of cells known as T-cells. 
There are 2 types of T-cells, called CD4 and CD8. Now, 
while the total number of T-cells in AIDS sufferers is (at 
least in the early stages of the disease) the same as that 
in healthy individuals, it has recently been discovered that 
the mix of CD4 and CD8 T-cells is different. Roughly 60 
percent of the T-cells of a healthy person are of the CD4 
type, whereas the percentage of the T-cells that are of 
CD4 type appears to decrease continually in AIDS suf- 
ferers. A recent model proposes that the HIV virus (the 
virus that causes AIDS) attacks CD4 cells and that the 
body’s mechanism for replacing killed T-cells does not dif- 
ferentiate between whether the killed T-cell was CD4 or 
CD68. Instead, it just produces a new T-cell that is CD4 
with probability .6 and CD8 with probability .4. However, 
although this would seem to be a very efficient way of 
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replacing killed T-cells when each one killed is equally 
likely to be any of the body’s T-cells (and thus has prob- 
ability .6 of being CD4), it has dangerous consequences 
when facing a virus that targets only the CD4 T-cells. 


7.15. In Example 2h, say that i and j, i # j, form a matched 
pair if i chooses the hat belonging to j and j chooses the 
hat belonging to 7. Find the expected number of matched 
pairs. 


7.16. Let Z be an exponentially distributed random vari- 
able with rate A. For some fixed x > 0, set 


Zz 

e L, 2X 
X= 

{2 Dek 


Find E[X] in terms of A and x. 


7.17. A deck of n cards numbered 1 through x is thoroughly 
shuffled so that all possible n! orderings can be assumed 
to be equally likely. Suppose you are to make n guesses 
sequentially, where the ith one is a guess of the card in 
position i. Let N denote the number of correct guesses. 


(a) If you are not given any information about your earlier 
guesses, show that for any strategy, E[N] = 1. 
(b) Suppose that after each guess you are shown the card 
that was in the position in question. What do you think is 
the best strategy? Show that under this strategy, 

1 1 


ia! 
~ | — dx = logn 
1x 


(c) Suppose that you are told after each guess whether you 
are right or wrong. In this case, it can be shown that the 
strategy that maximizes E[N] is one that keeps on guess- 
ing the same card until you are told you are correct and 
then changes to a new card. For this strategy, show that 


1 1 1 
FIN=14+ 5+ teeta 


e-—1 


+1 


2 


Hint: For all parts, express N as the sum of indicator (that 
is, Bernoulli) random variables. 


7.18. On a TV show, 10 couples are separated from each 
other and placed in two groups. Those in the first group 
pick a room when the second group is not looking. Each 
member of the second group then selects a room at ran- 
dom and independent of every other member in the group. 
Each room can be selected more than once or not at 
all. Compute the expected number of times a couple is 
reunited when the room is selected. 


7.19. A certain region is inhabited by r distinct types 
of a certain species of insect. Each insect caught will, 
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independently of the types of the previous catches, be of 
type i with probability 


r 
P;,i=1,...,r eel 
1 


(a) Compute the mean number of insects that are caught 
before the first type 1 catch. 

(b) Compute the mean number of types of insects that are 
caught before the first type 1 catch. 


7.20. In an urn containing n balls, the ith ball has weight 
W(i),i = 1,...,n. The balls are removed without replace- 
ment, one at a time, according to the following rule: At 
each selection, the probability that a given ball in the urn 
is chosen is equal to its weight divided by the sum of the 
weights remaining in the urn. For instance, if at some time 
ij,...,é, is the set of balls remaining in the urn, then the 


Fr 
next selection will be i; with probability W(i;) j > Wx), 


k=1 
j = 1,...,7. Compute the expected number of balls that 
are withdrawn before ball number 1 is removed. 


7.21. Fifty people are placed randomly in 200 rooms. 
Compute 

(a) the expected number of rooms containing exactly 2 
people; 

(b) the expected number of non-vacant rooms. 


7.22. How many times would you expect to roll a fair die 
before all 6 sides appeared at least once? 


7.23. Urn 1 contains 5 white and 6 black balls, while urn 2 
contains 8 white and 10 black balls. Two balls are randomly 


selected from urn 1 and are put into urn 2. If 3 balls are * 


then randomly selected from urn 2, compute the expected 
number of white balls in the trio. 

Hint: Let X; = 1 if the ith white ball initially in urn 1 is 
one of the three selected, and let _X; = 0 otherwise. Simi- 
larly, let Y; = 1 if the ith white ball from urn 2 is one of the 
three selected, and let Y; = 0 otherwise. The number of 


white balls in the trio can now be written as 7 X; + >> Yi. 
n 1 


7.24. A bottle initially contains m large pills and n small 
pills. Each day, a patient randomly chooses one of the pills. 
If a small pill is chosen, then that pill is eaten. If a large 
pill is chosen, then the pill is broken in two; one part is 
returned to the bottle (and is now considered a small pill) 
and the other part is then eaten. 


(a) Let X denote the number of small pills in the bottle 
after the last large pill has been chosen and its smaller half 
returned. Find E[X]. 

Hint: Define n + m indicator variables, one for each of the 
small pills initially present and one for each of the m small 
pills created when a large one is split in two. Now use the 
argument of Example 2m. 


(b) Let Y denote the day on which the last large pill is cho- 
sen. Find E[Y]. 
Hint: What is the relationship between X and Y? 


7.25. Let X1,X2,... be a sequence of independent and 
identically distributed continuous random variables. Let 
N = 2 be such that 


X, =X. =--- = Xn_-1 < XN 


That is, N is the point at which the sequence stops decreas- 
ing. Show that E[N] = e. 
Hint: First find P{N = n}. 


7.26. Let X1,...,X, be independent and identically dis- 
tributed random variables having an exponential distribu- 
tion with parameter A. Find 


(a) the expected value of the minimum of X1,...,Xn; 


(b) the expected value of the median of X1,..., Xn if n is 
odd. 


7.27. If n prizes are randomly distributed among k partici- 
pants in a lottery, show that at least one participant has at 


least [:| prizes with probability 1. 


“7.28. The k-of-r-out-of-n circular reliability system, k =r =< 
n, consists of nm components that are arranged in a circu- 
lar fashion. Each component is either functional or failed, 
and the system functions if there is no block of r con- 
secutive components of which at least k are failed. Show 
that there is no way to arrange 47 components, 8 of which 
are failed, to make a functional 3-of-12-out-of-47 circular 
system. 


7.29. There are 4 different types of coupons, the first 2 of 
which comprise one group and the second 2 another group. 
Each new coupon obtained is type i with probability p;, 
where py = p2 = 1/8,p3 = pa = 3/8. Find the expected 
number of coupons that one must obtain to have at least 
one of 

(a) all 4 types; 

(b) all the types of the first group; 

(c) all the types of the second group; 

(d) all the types of either group. 


7.30. X and Y are independent exponentially distributed 
random variables with rate parameters A; and Az, respec- 
tively. Find 


Axe | 
in terms of A and Ap. 


7.31. In Problem 76, calculate the variance of the sum of 
the rolls. 


7.32. In Problem 79, compute the variance of the number 
of empty urns. 


7.33. If ELX(X — 1)] = 3 and E[X] = 2, find for constants 
a,b 

(a) E[(a — bX)"; 

(b) Var(bX — a). 


7.34. In a circle, 3 Germans, 3 Brazilians, 3 Japanese, and 
3 Russians randomly position themselves. Find the expec- 
tation and variance of the number of people of the same 
nationalities standing next to each other. 


7.35. A fair six-sided die is rolled sequentially. Compute 
the expected number of times a die needs to be rolled in 
order to obtain 


(a) 2 odd numbers; 
(b) 7 sixes; 
(c) all the numbers. 


7.36. Outcomes of n successive independent trials are 
either positive, negative (each with probability .4), or 
inconclusive. Find Cov(X, Y) where X, Y represent the 
number of positive and negative outcomes, respectively. 


7.37. U;, U2 are independently and uniformly generated 
between 0 and 1 by a machine. Let X denote their sum 
and let Y be equal to 1 — U2. Compute Cov(X, Y). 


7.38. Suppose X and Y have the following joint probability 
mass function: 


342i-j 


i7y= 45 ij € {1,2, 3} 
P(i,]) ( 


elsewhere 


(a) Find ELX] and E[Y]. 

(b) Find Var(X) and Var(Y). 
(c) Find CovcX, Y). 

(d) Find Corr(X, Y). 


7.39. Suppose that 2 balls are randomly removed from an 
urn containing n red and m blue balls. Let X; = 1 if the i 
ball removed is red, and let it be 0 otherwise, i = 1,2. 

(a) Do you think that Cov(X1, X2) is negative, zero, or pos- 
itive. 

(b) Validate your answer to part (a). 

Suppose the red balls are numbered, and let Y; equal 1 if 
red ball number i is removed, and let it be 0 if that ball is 
not removed. 

(c) Do you think that Cov(Y1, Y2) is negative, zero, or pos- 
itive. 

(d) Validate your answer to part (c). 


7.40. The random variables X and Y have a joint density 
function given by 
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2Inx 


1=x=3,0=ysx 
otherwise 


Compute Cov(X, Y). 


7.41. Let X1,...be independent with common mean yu and 
common variance o?, and set Y, = X, + Xnit + Xn42- 
For j = 0, find Cov(Yn, Yn4)). 


7.42. The joint density function of X and Y is given by 


e Vt) | 


fay) = x>0,y >0 


1 
y 
Find E[X], E[Y], and show that Cov(X, Y) = 1. 


7.43. Five chits are picked from an urn containing 42 chits 
with numbers written on each. Before this selection, a per- 
son picks 5 chits and wins if he guesses 3 numbers correctly. 
What are the mean and variance of the guessed numbers? 
What is the probability of winning? 


7.44. A group of 20 people consisting of 10 men and 10 
women is randomly arranged into 10 pairs of 2 each. Com- 
pute the expectation and variance of the number of pairs 
that consist of a man and a woman. Now suppose the 20 
people consist of 10 married couples. Compute the mean 
and variance of the number of married couples that are 
paired together. 


7.45. Let X1,X2,...,X» be independent random variables 
having an unknown continuous distribution function F, 
and let Yj, Y2,..., Ym be independent random variables 
having an unknown continuous distribution function G. 
Now order those n + m variables, and let 


1 if the ith smallest of then + m 
variables is from the X sample 
0 otherwise 


i= 


n+m 
The random variable R = )°> iJ; is the sum of the ranks 


i=1 

of the X sample and is the basis of a standard statistical 
procedure (called the Wilcoxon sum-of-ranks test) for test- 
ing whether F and G are identical distributions. This test 
accepts the hypothesis that F = G when R is neither too 
large nor too small. Assuming that the hypothesis of equal- 
ity is in fact correct, compute the mean and variance of R. 
Hint: Use the results of Example 3e. 


7.46. Between two distinct methods for manufacturing cer- 
tain goods, the quality of goods produced by method 7 is 
a continuous random variable having distribution Fj,i = 
1,2. Suppose that n goods are produced by method 1 and 
m by method 2. Rank the n + m goods according to qual- 
ity, and let 
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1 if the jth best was produced from 
method 1 
2 otherwise 


Xj = 


For the vector Xj, X2,...,Xnim, which consists of n 1’s 
and m 2’s, let R denote the number of runs of 1. For 
instance, ifm = 5,m = 2, and X = 1,2,1,1,1,1,2, then 
R=2.If F, = F) (that is, if the two methods produce iden- 
tically distributed goods), what are the mean and variance 
of R? 


7.47. Let X1,X2, and X3 be pairwise uncorrelated and 
gamma distributed, with shape and scale parameters k and 
@. Compute in terms of k and 6 the correlations of 


(a) X; and _X; + X; 
(b) X, + 2X2 and X; + X2. + X3. 


7.48. Consider the following dice game, as played at a cer- 
tain gambling casino: Players 1 and 2 roll a pair of dice in 
turn. The bank then rolls the dice to determine the out- 
come according to the following rule: Player i,i = 1,2, 
wins if his roll is strictly greater than the bank’s. For i = 
1,2, let 


L= 1 ifiwins 
‘10 otherwise 
and show that J; and J) are positively correlated. Explain 
why this result was to be expected. 
7.49. Consider a graph having n vertices labeled 1,2,...,n, 
and suppose that, between each of the ) pairs of distinct 


vertices, an edge is independently present with probability 
p. The degree of vertex i, designated as Dj, is the number 
of edges that have vertex i as one of their vertices. 


(a) What is the distribution of D;? 
(b) Find p(Dj, Dj), the correlation between Dj; and Dj. 


7.50. A fair die is rolled successively. Let X and Y denote 
the sum of scores and the absolute difference of scores 
respectively. Find 


(a) E[Y]; 
(b) ELX|¥ = 1]; 
(c) E[Y|X = 7]. 


7.51. A deliveryman delivers 20 parcels in a day. Thirty 
percent of the time, the deliveryman works in the Bella 
Vista neighborhood, where there is a 20 percent chance of 
receiving a complaint from any one customer. Otherwise, 
the deliveryman works in the Pembroke neighborhood, 
where the chance of receiving a complaint from any one 
customer is 10 percent. Let X be the number of received 
complaints on any given day. Compute E[X|X > 0]. 


7.52. The joint density of X and Y is given by 
ey 

2(1 — e) 

Given k > 0, derive ELX*|Y = y] in terms of k and y. 


fay) =x + , O<x<10<y<1 


7.53. The joint density of X and Y is given by 


f(xy) =10x*y, O<x<1,0<y<x 


Given k > 0, derive E[Y*|X = x] in terms of k and x. 


7.54. A population is made up of r disjoint subgroups. Let 
p; denote the proportion of the population that is in sub- 
group i,i = 1,...,r. If the average weight of the members 
of subgroup 7 is wj,i = 1,...,7, what is the average weight 
of the members of the population? 


7.55. A prisoner is trapped in a cell containing 3 doors. 
The first door leads to a tunnel that returns him to his 
cell after 2 days’ travel. The second leads to a tunnel 
that returns him to his cell after 4 days’ travel. The 
third door leads to freedom after 1 day of travel. If it is 
assumed that the prisoner will always select doors 1, 2, 
and 3 with respective probabilities .5, .3, and .2, what is 
the expected number of days until the prisoner reaches 
freedom? 


7.56. Consider the following dice game: A pair of dice is 
rolled. If the sum is 7 then the game ends and you win 0. 
If the sum is not 7, then you have the option of either stop- 
ping the game and receiving an amount equal to that sum 
or starting over again. For each value of j,i = 2,..., 12, find 
your expected return if you employ the strategy of stop- 
ping the first time that a value at least as large as i appears. 
What value of i leads to the largest expected return? 

Hint: Let X; denote the return when you use the critical 
value i. To compute ELX;], condition on the initial sum. 


7.57. Ten hunters are waiting for ducks to fly by. When a 
flock of ducks flies overhead, the hunters fire at the same 
time, but each chooses his target at random, independently 
of the others. If each hunter independently hits his target 
with probability .6, compute the expected number of ducks 
that are hit. Assume that the number of ducks in a flock is 
a Poisson random variable with mean 6. 


7.58. The number of people who enter an elevator on the 
ground floor is a Poisson random variable with mean 10. 
If there are N floors above the ground floor, and if each 
person is equally likely to get off at any one of the N 
floors, independently of where the others get off, compute 
the expected number of stops that the elevator will make 
before discharging all of its passengers. 


7.59. In a month, the expected number of damage claims 
received by an insurance company is 30. The expected 
sizes of the claims (in units of currency) are independent 
with mean 1000. If the size of each claim is independent 


of the number of claims that occur, compute the expected 
total sum of claims in a month. 


7.60. A player draws a card from a pack of 52 cards and 
replaces it until both a numbered playing card appears and 
an ace, or a king, or a queen, or a jack appears. Find 


(a) the probability that the last drawn card is an ace, or a 
king, or a queen, or a jack; 
(b) the expected number of draws. 


7.61. A coin that comes up heads with probability p is con- 
tinually flipped. Let N be the number of flips until there 
have been both at least n heads and at least m tails. Derive 
an expression for E[N] by conditioning on the number of 
heads in the first m + m flips. 


7.62. There are n + 1 participants in a game. Each person 
independently is a winner with probability p. The winners 
share a total prize of 1 unit. (For instance, if 4 people win, 
then each of them receives i, whereas if there are no win- 
ners, then none of the participants receives anything.) Let 
A denote a specified one of the players, and let X denote 
the amount that is received by A. 


(a) Compute the expected total prize shared by the 
players. 
Laila py 

+1 
(c) Compute EX] by conditioning on whether A is a win- 
ner, and conclude that 


(b) Argue that ELX] = 


_ (1 _ pyr! 
(n + Ip 


E[a + B)]= : 


when B is a binomial random variable with parameters n 
and p. 

7.63. Each of m + 2 players pays 1 unit to a kitty in order 
to play the following game: A fair coin is to be flipped suc- 
cessively n times, where n is an odd number, and the suc- 
cessive outcomes are noted. Before the v flips, each player 
writes down a prediction of the outcomes. For instance, if 
n = 3, then a player might write down (H,H,T), which 
means that he or she predicts that the first flip will land on 
heads, the second on heads, and the third on tails. After 
the coins are flipped, the players count their total number 
of correct predictions. Thus, if the actual outcomes are all 
heads, then the player who wrote (H, H, T) would have 2 
correct predictions. The total kitty of m + 2 is then evenly 
split up among those players having the largest number of 
correct predictions. 

Since each of the coin flips is equally likely to land on 
either heads or tails, m of the players have decided to 
make their predictions in a totally random fashion. Specif- 
ically, they will each flip one of their own fair coins n times 
and then use the result as their prediction. However, the 
final 2 of the players have formed a syndicate and will 
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use the following strategy: One of them will make pre- 
dictions in the same random fashion as the other m play- 
ers, but the other one will then predict exactly the oppo- 
site of the first. That is, when the randomizing member 
of the syndicate predicts an H, the other member pre- 
dicts a T. For instance, if the randomizing member of the 
syndicate predicts (H, H, T), then the other one predicts 
(T, T, H). 


(a) Argue that exactly one of the syndicate members will 
have more than n/2 correct predictions. (Remember, n is 
odd.) 


(b) Let X denote the number of the m nonsyndicate play- 
ers who have more than n/2 correct predictions. What is 
the distribution of X? 


(c) With X as defined in part (b), argue that 


E|payoff to the syndicate] = (m + 2) 


XE : 
bea 


(d) Use part (c) of Problem 762 to conclude that 


2(m + 2) 
mt+1 


1" 


and explicitly compute this number when m = 1,2, and 3. 
Because it can be shown that 


1 m+1 
: ( 5) > 2 
it follows that the syndicate’s strategy always gives it a pos- 
itive expected profit. 


E[payoff to the syndicate] = 


2(m + 2) 
m+1 


7.64. The number of goals that J scores in soccer games 
that her team wins is Poisson distributed with mean 2, 
while the number she scores in games that her team loses 
is Poisson distributed with mean 1. Assume that, indepen- 
dent of earlier results, J’s team wins each new game it plays 
with probability p. 

(a) Find the expected number of goals that J scores in her 
team’s next game. 

(b) Find the probability that J scores 6 goals in her next 4 
games. 

Hint: Would it be useful to know how many of those games 
were won by J’s team. 

Suppose J’s team has just entered a tournament in which 
it will continue to play games until it loses. Let X denote 
the total number of goals scored by J in the tournament. 
Also, let N be the number of games her team plays in the 
tournament. 


(c) Find E[X]. 
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(d) Find P(X = 0). 
(e) Find P(N = 3|X = 5). 


7.65. If the level of infection of a tree is x, then each treat- 
ment will independently be successful with probability 
1 — x. Consider a tree whose infection level is assumed 
to be the value of a uniform (0,1) random variable. 


(a) Find the probability that a single treatment will result 
in a cure. 

(b) Find the probability that the first two treatments are 
unsuccessful. 

(c) Find the probability it will take n treatments for the 
tree to be cured. 


7.66. Let X;,... be independent random variables with 
the common distribution function F, and suppose they 
are independent of N, a geometric random variable with 
parameter p. Let M = max(Xj,..., Xn). 


(a) Find P{M = x} by conditioning on N. 

(b) Find P{M S x|N = 1}. 

(c) Find P{M S x|N > I}. 

(d) Use (b) and (c) to rederive the probability you found 
in (a). 


7.67. Let U;, U2,... be a sequence of independent uniform 
(0, 1) random variables. In Example 5i, we showed that for 
0 = x = 1,E[N(x)] = e*, where 


n 
N(x) = min pee >x 
i=1 


This problem gives another approach to establishing that 
result. 


(a) Show by induction on n that for 0 < x =1andalln=0, 


xn 
P{N(X) 2n + 1p = — 
n! 
Hint: First condition on Uj, and then use the induction 
hypothesis. 
Use part (a) to conclude that 


E[N(x)] = e&* 


7.68. An urn contains 30 balls, of which 10 are red and 8 
are blue. From this urn, 12 balls are randomly withdrawn. 
Let X denote the number of red and Y the number of blue 
balls that are withdrawn. Find Cov(X, Y) 


(a) by defining appropriate indicator (that is, Bernoulli) 


random variables 
10 8 


X;, Yj such that x2 Y= yy 
i=1 j=l 

(b) by conditioning (on either X or Y) to determine 

E[XY]. 


7.69. Suppose the distance covered by tyres of type / fol- 
lows a gamma distribution with shape and scale parame- 
ters k; and 6;, respectively. Given 0 < a < 1,100a percent 
of cars have tyres of type 1 and the rest have tyres of type 2. 
Let X denote the distance covered by a randomly picked 
tyre. Find 


(a) ELX]; 
(b) Var(X). 


7.70. The number of winter storms in a good year is a Pois- 
son random variable with mean 3, whereas the number ina 
bad year is a Poisson random variable with mean 5S. If next 
year will be a good year with probability .4 or a bad year 
with probability .6, find the expected value and variance of 
the number of storms that will occur. 


7.71. In Example 5c, compute the variance of the length of 
time until the miner reaches safety. 


7.72. Consider a gambler who, at each gamble, either wins 
or loses her bet with respective probabilities p and 1 — p. 
A popular gambling system known as the Kelley strategy 
is to always bet the fraction 2p — 1 of your current fortune 
when p > 5. Compute the expected fortune after n gam- 
bles of a gambler who starts with x units and employs the 
Kelley strategy. 


7.73. The number of accidents that a person has in a given 
year is a Poisson random variable with mean 4. However, 
suppose that the value of A changes from person to person, 
being equal to 2 for 60 percent of the population and 3 for 
the other 40 percent. If a person is chosen at random, what 
is the probability that he will have (a) Zero accidents and 
(b) exactly 3 accidents in a certain year? What is the con- 
ditional probability that he will have 3 accidents in a given 
year, given that he had no accidents the preceding year? 


7.74. Repeat Problem 773 when the proportion of the pop- 
ulation having a value of A less than x is equal to 1 — e~*. 


7.75. Consider an urn containing a large number of coins, 
and suppose that each of the coins has some probability p 
of turning up heads when it is flipped. However, this value 
of p varies from coin to coin. Suppose that the composi- 
tion of the urn is such that if a coin is selected at random 
from it, then the p-value of the coin can be regarded as 
being the value of a random variable that is uniformly 
distributed over [0, 1]. If a coin is selected at random from 
the urn and flipped twice, compute the probability that 


(a) the first flip results in a head; 
(b) both flips result in heads. 


7.76. In Problem 775, suppose that the coin is tossed n 
times. Let X denote the number of heads that occur. 
Show that 


P{X=}= 


a 


n+ 


Hint: Make use of the fact that 


(a — 1)!(b — 1)! 
(a+b — 1)! 


1 
/ etd —xPlda= 
0 


when a and Db are positive integers. 


7.77. Suppose that in Problem 775, we continue to flip the 
coin until a head appears. Let N denote the number of flips 
needed. Find 


(a) P(N = i},i= 1; 
(b) P{N = 3}; 
(c) EIN]. 


7.78. In Example 6b, let S denote the signal sent and R the 
signal received. 


(a) Compute E[R]. 

(b) Compute Var(R). 

(c) Is R normally distributed? 
(d) Compute Cov(R, S). 


7.79. In Example 6c, suppose that X is uniformly dis- 
tributed over (0, 1). If the discretized regions are deter- 
mined by apg = 0, a1 = 5 and az = 1, calculate the optimal 
quantizer Y and compute E[(X — Y)’]. 


7.80. The moment generating function of X is given by 
Mx(t) = exp{2e' — 2} and that of Y by My() = 
0 


(Ge 4 1) . If X and Y are independent, what are 


(a) P(X + Y =2}? 
(b) PLXY = 0}? 
(c) ELXY]? 


7.81. The number of patients entering a ward is a Poisson 
distribution with mean 5. The probability of each admitted 
patient testing positive to an infection is .1. Compute the 
joint moment generating function for the total number of 
patients and the number of patients tested positive. 


7.82. The joint density of X and Y is given by 


2 
ere FWP Vey <oo, 


flay) = 
oy S/n 


—oo < X¥ < Ce 


Theoretical Exercises 


7.1. Show that E[(X — a)?] is minimized at a = E[X]. 


7.2. Suppose that X is a continuous random variable with 
density function f. Show that E[|X — al] is minimized 
when a is equal to the median of F. 
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(a) Compute the joint moment generating function of X 
and Y. 


(b) Compute the individual moment generating functions. 


7.83. Two envelopes, each containing a check, are placed 
in front of you. You are to choose one of the envelopes, 
open it, and see the amount of the check. At this point, 
either you can accept that amount or you can exchange it 
for the check in the unopened envelope. What should you 
do? Is it possible to devise a strategy that does better than 
just accepting the first envelope? 

Let A and B,A < B, denote the (unknown) amounts of 
the checks, and note that the strategy that randomly selects 
an envelope and always accepts its check has an expected 
return of (A + B)/2. Consider the following strategy: Let 
F(-) be any strictly increasing (that is, continuous) distri- 
bution function. Choose an envelope randomly and open 
it. If the discovered check has the value x, then accept 
it with probability F(x) and exchange it with probability 
1 — F(x). 


(a) Show that if you employ the latter strategy, then your 
expected return is greater than (A + B)/2. 

Hint: Condition on whether the first envelope has the 
value A or B. 


Now consider the strategy that fixes a value x and then 
accepts the first check if its value is greater than x and 
exchanges it otherwise. 

(b) Show that for any x, the expected return under 
the x-strategy is always at least (A + B)/2 and 
that it is strictly larger than (A + B)/2 if x lies between 
A and B. 

(c) Let X be a continuous random variable on the whole 
line, and consider the following strategy: Generate the 
value of X, and if X = x, then employ the x-strategy of 
part (b). Show that the expected return under this strategy 
is greater than (A + B)/2. 


7.84. Weekly log-returns on two stocks have a bivariate 
normal distribution with common mean 0, standard devi- 
ations of the first and second stock being .3 and .2 respec- 
tively, and correlation .5. 


(a) Find the probability that the absolute value of the sum 
of the two log-returns is greater than .2. 

(b) Recalculate the probability if the correlation is —.5. 
(c) Give an intuitive reason why the probability in (b) is 
less than the probability in (a). 


Hint: Write 
EX — al] = f bx — atfiaydy 


Now break up the integral into the regions where x < a 
and where x > a, and differentiate. 
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7.3. Prove Proposition 2.1 when 


(a) X and Y have a joint probability mass function; 
(b) X and Y have a joint probability density function and 
g(x,y) = Ofor all x,y. 


7.4. Let X be a random variable having finite expectation 
yw and variance o”, and let g(-) be a twice differentiable 
function. Show that 


Elg a] ~ gy) + 5? 


Hint: Expand g(-) in a Taylor series about jy. Use the first 


three terms and ignore the remainder. 


7.5. If X = 0 and g is a differentiable function such that 
g(0) = 0, show that 


Elsa] = f P(X > Ng'(Odt 


Hint: Define random variables /(t),t = 0 so that 
x co 
g(X) = / g(t) dt = / T()g'(t) dt 
0 0 


7.6. Let Aj,A2,...,An be arbitrary events, and define 
Cx = {at least k of the A; occur}. Show that 


> P(Ce) = D5 PA) 
k=1 k=1 


Hint: Let X denote the number of the A; that occur. Show 
that both sides of the preceding equation are equal to 
E[X]. 


7.7. In the text, we noted that 


E\ > Xi) =>" Fix] 
i=1 i=1 


when the X; are all nonnegative random variables. Since 
an integral is a limit of sums, one might expect that 


E | [ xeoar| = [ E[X(6)] dt 
0 0 


whenever X(t),0 = t < o, are all nonnegative random 
variables; this result is indeed true. Use it to give another 
proof of the result that for a nonnegative random vari- 
able X, 


E[X) = [Px > tdt 
0 


Hint: Define, for each nonnegative f, the random variable 
X (t) by 
ift< X 


1 
xo={o if 1=X 


Now relate /y° X(d)dt to X. 


7.8. We say that X is stochastically larger than Y, written 
X 2s Y, if, for all t, 


PIX > f= P{Y > 


Show that if ¥ =, Y, then E[X] = E[Y] when 


(a) X and Y are nonnegative random variables; 


(b) X and Y are arbitrary random variables. 
Hint: Write X as 


X=Xt = xX- 
where 
X if X=0 0 if X=0 
oa a 
= =15 ifx <0 * hy if X <0 


Similarly, represent Y as Yt — Y~. Then make use of part 
(a). 


7.9. Show that X is stochastically larger than Y if and 
only if 
Elf(X)] = E[f(Y)] 


for all increasing functions f. 

Hint: Show that X =, Y, then E[f(X)] = E[f(Y)] by show- 
ing that f(X) =y f(Y) and then using Theoretical Exer- 
cise 78. To show that if E[f(X)] = E[f(Y)] for all increasing 
functions f, then P(X > t} = P{Y > 1}, define an appro- 
priate increasing function f. 


7.10. A coin having probability p of landing on heads is 
flipped n times. Compute the expected number of runs of 
heads of size 1, of size 2, and of sizek,1 = k =n. 


711. Let X1,X2,...,Xn be independent and identically 
distributed positive random variables. For k = n, find 


7.12. Consider n independent trials, each resulting in 
any one of r possible outcomes with probabilities 
P,,P2,...,P,. Let X denote the number of outcomes that 
never occur in any of the trials. Find E[X] and show 
that among all probability vectors P),...,P,, E[X] is min- 
imized when P; = 1/r,i=1,...,r. 


7.13. Let X1, X2,... be a sequence of independent random 
variables having the probability mass function 


P{X, = 0} 


n=1 


P(X, = 2} =1/2, 


The random variable X = }°° , X,/3” is said to have the 
Cantor distribution. Find E[X] and Var(X). 


7.14. Let X1,...,X, be independent and identically dis- 
tributed continuous random variables. We say that a 
record value occurs at time j,j = n, if X; = X; for all 
1 =i = j. Show that 


n 
(a) E[number of record values] = * 1/j; 
j=l 


n 
(b) Var(number of record values) = eG —1)/ Pe 
j=l 
7.15. For Example 2i, show that the variance of the num- 
ber of coupons needed to amass a full set is equal to 


os 
— p2 

ay (N — i) 

When N is large, this can be shown to be approximately 

equal (in the sense that their ratio approaches 1 as Noo) 

to N?x?/6. 


7.16. Consider n independent trials, the ith of which results 
in a success with probability P;. 


(a) Compute the expected number of successes in the n 
trials—call it pu. 


(b) For a fixed value of 4, what choice of P},.. 
mizes the variance of the number of successes? 


(c) What choice minimizes the variance? 


., Py Maxi- 


"TAT Suppose that each of the elements of S = {1,2,...,7} 
is to be colored either red or blue. Show that if A,,...,A, 
are subsets of S, there is a way of doing the coloring so 


A 
that at most )>(1/2)'4!-! of these subsets have all their 


i=l 
elements the same color (where |A| denotes the number 
of elements in the set A). 


7.18. Suppose that X; and X2 are independent random 
variables having a common mean j. Suppose also that 
Var(X1) = a; and Var(X>) = os, The value of wu is 
unknown, and it is proposed that uw be estimated by a 
weighted average of X; and X2. That is, AX, + (1 — A)X2 
will be used as an estimate of jz for some appropriate value 
of 4. Which value of A yields the estimate having the low- 
est possible variance? Explain why it is desirable to use 
this value of i. 


7.19. In Example 4f, we showed that the covariance of 
the multinomial random variables N; and N; is equal to 
—mP;P; by expressing N; and Nj as the sum of indicator 
variables. We could also have obtained that result by using 
the formula 


Var(N; + Nj)=Var(Ni) + Var(Nj) + 2 Cov(Ni, Nj) 


A First Course in Probability 399 


(a) What is the distribution of Nj + Nj? 


(b) Use the preceding identity to show that Cov(N;, Nj) = 
—mP;P;. 


7.20. Show that X and Y are identically distributed and not 
necessarily independent, then 


Cov(X¥ + Y,X — Y)=0 
7.21. The Conditional Covariance Formula. The condi- 


tional covariance of X and Y, given Z, is defined by 


Cov(X, YIZ) = E[(X — E[X|Z)(Y — E[Y|Z)IZ] 
(a) Show that 
Cov(X, Y|Z) = E[XY|Z] — E[X|Z]E[Y|Z] 
(b) Prove the conditional covariance formula 


Cov(X, Y) = E[Cov(x, Y|Z)] 
+Cov(E[X|Z], E[Y|Z]) 


(c) Set X = Y in part (b) and obtain the conditional vari- 
ance formula. 


7.22. Let X(),i = 1,...,n, denote the order statistics from 
a set of m uniform (0, 1) random variables, and note that 
the density function of X(j) is given by 


ni 


—Din — d! 


= ae ea ed 


f@= i 


(a) Compute Var(X(j)),i = 1,...,n. 
(b) Which value of i minimizes, and which value maxi- 
mizes, Var(X(i))? 


7.23. Show that Y =a + bX, then 


ifb > 0 
ifb < 0 


pon= |" 


7.24. Show that Z is a standard normal random variable 
and if Y is defined by Y=a + bZ + cZ?, then 


AY,Z)= 


b 
Vb? + 2c? 
7.25. Prove the Cauchy—Schwarz inequality, namely, 
(E[XY])* Ss E[X?]E[Y?] 


Hint: Unless Y = —tX for some constant, in which case 
the inequality holds with equality, it follows that for all f, 


0 < E[(tX + Y)*] = E[X?]? + 2E[XY]t + E[Y?] 
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Hence, the roots of the quadratic equation 
E[X?]? + 2E[XY]t + E[Y?] =0 


must be imaginary, which implies that the discriminant of 
this quadratic equation must be negative. 


7.26. Show that if X and Y are independent, then 


E[|X|Y = y]=E[X]  forally 
(a) in the discrete case; 
(b) in the continuous case. 


7.27. Prove that E[g(X)Y|X] = g(X)E[Y|X]. 


7.28. Prove that if E[Y|X = x] = E[Y] for all x, then X 
and Y are uncorrelated; give a counterexample to show 
that the converse is not true. 

Hint: Prove and use the fact that E[XY] = E[XE[Y|X]]. 


7.29. Show that Cov(X, E[Y|X]) = Cov(x, Y). 


7.30. Let X1,...,X, be independent and identically dis- 
tributed random variables. Find 


ELX1|X1 ap ee eate Xn = x] 


7.31. Consider Example 4f, which is concerned with the 
multinomial distribution. Use conditional expectation to 
compute E[N;N;], and then use this to verify the formula 
for Cov(Ni, Nj) given in Example 4f. 


7.32. An urn initially contains b black and w white balls. 
At each stage, we add r black balls and then withdraw, 
at random, r balls from the b + w + r balls in the urn. 
Show that 


E[number of white balls after stage f] 
ae) 
= | —————__} w 
b+weH+r 


7.33. For an event A, let 74 equal 1 if A occurs and 
let it equal 0 if A does not occur. For a random 
variable X, show that 


E[XT 4] 
P(A) 


E[X|A] = 


7.34. A coin that lands on heads with probability p is con- 
tinually flipped. Compute the expected number of flips 
that are made until a string of r heads in a row is obtained. 


Hint: Condition on the time of the first occurrence of tails 
to obtain the equation 


E[X]= (1 — p) op’ *G + E[X) 
i=1 


+ —p) )> ptr 


i=r+1 
Simplify and solve for ELX’]. 


7.35. For another approach to Theoretical Exercise 734, 
let 7; denote the number of flips required to obtain a run 
of r consecutive heads. 


(a) Determine E[T,|7,—1]. 

(b) Determine E[T,] in terms of E[T,—1]. 
(c) What is E[T,]? 

(d) What is E[7;]? 


7.36. The probability generating function of the discrete 
nonnegative integer valued random variable X having 
probability mass function p;, j = 0, is defined by 


$(s) = Els*] = >" pjs! 
j=0 


Let Y be a geometric random variable with parameter 
p =1-—s,where 0 < s < 1. Suppose that Y is independent 
of X, and show that 


p(s) =P{X < Y} 


7.37. One ball at a time is randomly selected from an 
urn containing a white and b black balls until all of the 
remaining balls are of the same color. Let M,» denote the 
expected number of balls left in the urn when the exper- 
iment ends. Compute a recursive formula for M,, and 
solve when a = 3 andb=5S. 


7.38. An urn contains a white and b black balls. After a 
ball is drawn, it is returned to the urn if it is white; but if it 
is black, it is replaced by a white ball from another urn. Let 
M,, denote the expected number of white balls in the urn 
after the foregoing operation has been repeated n times. 


(a) Derive the recursive equation 


Mn+ = (1 ) Mt +1 


a+b 


(b) Use part (a) to prove that 


My=a + b ~ b(1 - 


(c) What is the probability that the (7 + 1) ball drawn is 
white? 


7.39. The best linear predictor of Y with respect to X; and 
X?2 is equal toa + bX, + cX2, where a, b, and c are chosen 
to minimize 


ECY = @ + by + eXa)V) 
Determine a, b, and c. 


7.40. The best quadratic predictor of Y with respect to X 
isa + bX + cX2, where a,b, and c are chosen to minimize 
E[(Y — (a + bX + cX?))’]. Determine a,b, and c. 


7.41. Use the conditional variance formula to determine 
the variance of a geometric random variable X having 
parameter p. 


7.42. Let X be a normal random variable with parameters 
pw = Oand o2 = 1, and let J, independent of X, be such 


that P{J = 1} = 5 = P{I = 0}. Now define Y by 
X iff=1 
ae if =0 


In words, Y is equally likely to equal either X or —X. 


(a) Are X and Y independent? 

(b) Are J and Y independent? 

(c) Show that Y is normal with mean 0 and variance 1. 
(d) Show that Cov(X, Y) = 0. 


7.43. It follows from Proposition 6.1 and the fact that the 
best linear predictor of Y with respect to X is wy + 
pa(x — pty) that if 


E[Y|X] =a + bX 


then 
b=p— 


Oy 
a= Py — aD a = 
a x 


(Why?) Verify this directly. 


7.44. Show that for random variables _X and Z, 
E[(X — Y)*] = E[X’] - E[Y?] 


where 
Y=E[X|Z] 


7.45. Consider a population consisting of individuals able 
to produce offspring of the same kind. Suppose that by 
the end of its lifetime, each individual will have produced 
j new offspring with probability P;, 7 = 0, independently 
of the number produced by any other individual. The 
number of individuals initially present, denoted by Xo, 
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is called the size of the zeroth generation. All offspring 
of the zeroth generation constitute the first generation, 
and their number is denoted by Xj. In general, let X;, 
CO 
denote the size of the nth generation. Let ~ = )° jP; and 
i=0 
so j 
c= - i denote, respectively, the mean and the 
j=0 
variance of the number of offspring produced by a single 
individual. Suppose that Xo = 1— that is, initially there is 
a single individual in the population. 


(a) Show that 
E[Xn] = WE[Xn-1] 


(b) Use part (a) to conclude that 
E[Xn] = 2" 
(c) Show that 
Var(Xn) = 07"! + pu? Var(Xy_1) 
(d) Use part (c) to conclude that 
Var(X),) = — (“—) ies 


no 


ifm=1 

The model just described is known as a branching process, 
and an important question for a population that evolves 
along such lines is the probability that the population will 
eventually die out. Let 2 denote this probability when the 
population starts with a single individual. That is, 


mz = P{population eventually dies out|Xo = 1) 


(e) Argue that z satisfies 
£9 . 
r= > Px! 
j=0 


Hint: Condition on the number of offspring of the initial 
member of the population. 


7.46. Verify the formula for the moment generating func- 
tion of a uniform random variable that is given in Table 72. 
Also, differentiate to verify the formulas for the mean and 
variance. 


7.47. For a standard normal random variable Z, let wu, = 
E[Z”]. Show that 


0 when n is odd 
(2)! 
2ij! 


ce when n = 2] 
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Hint: Start by expanding the moment generating function 
of Z into a Taylor series about 0 to obtain 


E[e’7] = flr 


_ 3 (7/2) 
j=0 


j! 


7.48. Let X be a normal random variable with mean jz and 
variance o?. Use the results of Theoretical Exercise 7.47 to 
show that 


[n/2] ( 3j ) wii (2))! 


E[X"] = )0 Ti 


j=0 


In the preceding equation, [1/2] is the largest integer less 
than or equal to n/2. Check your answer by letting n = 1 
andn = 2. 


7.49. If Y = aX + b, where a and b are constants, express 
the moment generating function of Y in terms of the 
moment generating function of X. 


7.50. The positive random variable X is said to be a lognor- 
mal random variable with parameters w and o? if log(X) 
is a normal random variable with mean sj: and variance o”. 
Use the normal moment generating function to find the 
mean and variance of a lognormal random variable. 


7.51. Let X have moment generating function M(t), and 
define Y(t) = log M(t). Show that 


WO |r0 = Var(X) 


Self-Test Problems and Exercises 


7.1. Consider a list of m names, where the same name may 
appear more than once on the list. Let n(i), i = 1,...,m, 
denote the number of times that the name in position 7 
appears on the list, and let d denote the number of distinct 
names on the list. 


(a) Express d in terms of the variables m,n(i),i = 1,...,m. 
Let U be a uniform (0, 1) random variable, and let X = 
[mU] + 1. 

(b) What is the probability mass function of X? 

(c) Argue that E[m/n(X)] = d. 


7.2. An urn has n white and m black balls that are removed 
one at a time in a randomly chosen order. Find the 
expected number of instances in which a white ball is 
immediately followed by a black one. 


7.3. Twenty individuals consisting of 10 married couples 
are to be seated at 5 different tables, with 4 people at each 
table. 


n 
7.52. Use Table 72 to determine the distribution of }° X; 


i=1 
when Xj,...,Xn are independent and_ identically 
distributed exponential random variables, each having 
mean 1/A. 


7.53. Show how to compute Cov(X,Y) from the joint 
moment generating function of X and Y. 


7.54. Suppose that X1,...,Xn have a multivariate normal 
distribution. Show that X1,...,X, are independent ran- 
dom variables if and only if 


Cov(Xj, Xj) =0 wheni 4 j 


7.55. If Z is a standard normal random variable, what is 
Cov(Z, Z*)? 


7.56. Suppose that Y is a normal random variable with 
mean yw and variance o”, and suppose also that the con- 
ditional distribution of X, given that Y = y, is normal with 
mean y and variance 1. 


(a) Argue that the joint distribution of X,Y is the same 
as that of Y + Z, Y when Z is a standard normal random 
variable that is independent of Y. 

(b) Use the result of part (a) to argue that X,Y has a 
bivariate normal distribution. 

(c) Find ELX], Var(X), and Corr(X, Y). 

(d) Find E[Y|X = x]. 

(e) What is the conditional distribution of Y given that 
X =x? 


(a) If the seating is done “at random,” what is the expected 
number of married couples that are seated at the same 
table? 


(b) If 2 men and 2 women are randomly chosen to be 
seated at each table, what is the expected number of mar- 
ried couples that are seated at the same table? 


7.4. If a die is to be rolled until all sides have appeared at 
least once, find the expected number of times that outcome 
1 appears. 


7.5. A deck of 2n cards consists of n red and n black cards. 
The cards are shuffled and then turned over one at a time. 
Suppose that each time a red card is turned over, we win 
1 unit if more red cards than black cards have been turned 
over by that time. (For instance, if m = 2 and the result 
is rbrb, then we would win a total of 2 units.) Find the 
expected amount that we win. 


7.6. Let A;,A>2,...,A, be events, and let N denote the 
number of them that occur. Also, let J = 1 if all of these 
events occur, and let it be 0 otherwise. Prove Bonferroni’s 
inequality, namely, 


P(A, -++An) = }) P(A) — (n — 1) 
i=1 


Hint: Argue first that N =n — 14 J. 


77. Let X be the smallest value obtained when k num- 
bers are randomly chosen from the set 1,...,”. Find ELX] 
by interpreting X as a negative hypergeometric random 
variable. 


7.8. An arriving plane carries r families. A total of nj; of 
these families have checked in a total of j pieces of lug- 
gage, )\ nj = r. Suppose that when the plane lands, the 


j 
N = » jn; pieces of luggage come out of the plane in a ran- 


j 
dom order. As soon as a family collects all of its luggage, 
it immediately departs the airport. If the Sanchez family 
checked in j pieces of luggage, find the expected number 
of families that depart after they do. 


“7.9. Nineteen items on the rim of a circle of radius 1 are 
to be chosen. Show that for any choice of these points, 
there will be an arc of (arc) length 1 that contains at least 
4 of them. 


7.10. Let X be a Poisson random variable with mean A. 
Show that if 2 is not too small, then 


Var(/X) ~ .25 


Hint: Use the result of Theoretical Exercise 74 to approx- 
imate E[/X]. 


711. Suppose in Self-Test Problem 73 that the 20 people 
are to be seated at seven tables, three of which have 4 seats 
and four of which have 2 seats. If the people are randomly 
seated, find the expected value of the number of married 
couples that are seated at the same table. 


7.12. Individuals 1 through n,n > 1, are to be recruited 
into a firm in the following manner: Individual 1 starts 
the firm and recruits individual 2. Individuals 1 and 2 will 
then compete to recruit individual 3. Once individual 3 is 
recruited, individuals 1,2, and3 will compete to recruit 
individual 4, and so on. Suppose that when individuals 
1,2,...,icompete to recruit individual i + 1, each of them 
is equally likely to be the successful recruiter. 


(a) Find the expected number of the individuals 1,...,7 
who did not recruit anyone else. 

(b) Derive an expression for the variance of the number of 
individuals who did not recruit anyone else, and evaluate 
itforn =5. 
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7.13. The nine players on a basketball team consist of 2 
centers, 3 forwards, and 4 backcourt players. If the play- 
ers are paired up at random into three groups of size 3 
each, find (a) the expected value and (b) the variance of 
the number of triplets consisting of one of each type of 
player. 


7.14. A deck of 52 cards is shuffled and a bridge hand of 
13 cards is dealt out. Let X and Y denote, respectively, the 
number of aces and the number of spades in the hand. 


(a) Show that _X and Y are uncorrelated. 
(b) Are they independent? 


7.15. Each coin in a bin has a value attached to it. Each 
time that a coin with value p is flipped, it lands on heads 
with probability p. When a coin is randomly chosen from 
the bin, its value is uniformly distributed on (0, 1). Sup- 
pose that after the coin is chosen but before it is flipped, 
you must predict whether it will land on heads or on tails. 
You will win 1 if you are correct and will lose 1 otherwise. 


(a) What is your expected gain if you are not told the value 
of the coin? 


(b) Suppose now that you are allowed to inspect the coin 
before it is flipped, with the result of your inspection being 
that you learn the value of the coin. As a function of p, the 
value of the coin, what prediction should you make? 


(c) Under the conditions of part (b), what is your expected 
gain? 


7.16. In Self-Test Problem 7.1, we showed how to use the 
value of a uniform (0, 1) random variable (commonly 
called a random number) to obtain the value of a random 
variable whose mean is equal to the expected number of 
distinct names on a list. However, its use required that one 
choose a random position and then determine the num- 
ber of times that the name in that position appears on the 
list. Another approach, which can be more efficient when 
there is a large amount of replication of names, is as fol- 
lows: As before, start by choosing the random variable X 
as in Problem 71. Now identify the name in position X, 
and then go through the list, starting at the beginning, until 
that name appears. Let J equal 0 if you encounter that 
name before getting to position X, and let J equal 1 if your 
first encounter with the name is at position XY. Show that 
E|[ml] = d. 

Hint: Compute E[/] by using conditional expectation. 


7.17. A total of m items are to be sequentially distributed 
among n cells, with each item independently being put in 
cell j with probability p;, 7 = 1,...,n. Find the expected 
number of collisions that occur, where a collision occurs 
whenever an item is put into a nonempty cell. 


7.18. Let X be the length of the initial run in a random 
ordering of n ones and m zeros. That is, if the first k values 
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are the same (either all ones or all zeros), then X = k. Find 
E[X]. 


7.19. There are n items in a box labeled H and m in a box 
labeled 7. A coin that comes up heads with probability p 
and tails with probability 1 — p is flipped. Each time it 
comes up heads, an item is removed from the H box, and 
each time it comes up tails, an item is removed from the 
T box. (If a box is empty and its outcome occurs, then no 
items are removed.) Find the expected number of coin flips 
needed for both boxes to become empty. 

Hint: Condition on the number of heads in the first n + m 
flips. 


7.20. Let X be a nonnegative random variable having dis- 
tribution function F. Show that if F(x) = 1 — F(x), then 


E[X"] = i x" TF (x) dx 
0 
Hint: Start with the identity 


xX 
X= nf x"! dx 
0 


n ; x" T1y(x) dx 
0 


where 
1, ifx < X 
I) = fs otherwise 
"7.21. Let aj,...,dy, not all equal to0, be such that 
14 = 0. Show that there is a permutation ij,...,in 


such that We GiGi, < 0. 

Hint: Use the probabilistic method. (It is interesting that 
there need not be a permutation whose sum of products 
of successive pairs is positive. For instance, ifn = 3, 
a, = a2 = —1, and a3 = 2, there is no such permutation.) 


7.22. Suppose that X;, i = 1,2,3, are independent Poisson 
random variables with respective means ij, i = 1,2,3. Let 
X = X; + X2 and Y = X27 + X3. The random vector _X, Y 
is said to have a bivariate Poisson distribution. 


(a) Find ELX] and E[Y]. 

(b) Find Cov(Xx, Y). 

(c) Find the joint probability mass function P{X = i, 
Y = j}. 

7.23. Let (Xj, Yj), i= 1,..., be a sequence of independent 
and identically distributed random vectors. That is, X71, Y1 
is independent of, and has the same distribution as, X2, Y2, 


and so on. Although X; and Y; can be dependent, Xj and 
Y; are independent when i # j. Let 


jix= F(X], py = EY], 0? = Var(X), 
Oy = Var(Y;), p = Corr(Xj, Yj) 


Find Corr()0j-4 Xi, Dj-1 Yj). 


7.24. Three cards are randomly chosen without replace- 
ment from an ordinary deck of 52 cards. Let X denote the 
number of aces chosen. 


(a) Find ELX|the ace of spades is chosen]. 
(b) Find E[X at least one ace is chosen]. 


7.25. Let ® be the standard normal distribution function, 
and let XY be a normal random variable with mean p and 
variance 1. We want to find E[®(X)]. To do so, let Z be 
a standard normal random variable that is independent of 
X, and let 


(a) Show that E[/|X = x] = ®(x). 
(b) Show that E[®(X)] = P{Z < X}. 
(c) Show that E[®(X)] = © (4). 


/2 
Hint: What is the distribution of X — Z? 


The preceding comes up in statistics. Suppose you are 
about to observe the value of a random variable X that 
is normally distributed with an unknown mean y and vari- 
ance 1, and suppose that you want to test the hypothesis 
that the mean wp is greater than or equal to 0. Clearly 
you would want to reject this hypothesis if X is suf- 
ficiently small. If it results that X = x, then the p- 
value of the hypothesis that the mean is greater than 
or equal to 0 is defined to be the probability that X 
would be as small as x if 42 were equal to O (its small- 
est possible value if the hypothesis were true). (A small 
p-value is taken as an indication that the hypothesis is 
probably false.) Because X has a standard normal dis- 
tribution when ~ = 0, the p-value that results when 
X = x is ®(x). Therefore, the preceding shows that the 
expected p-value that results when the true mean is pu 
: be 

is ® (4). 

7.26. A coin that comes up heads with probability p is 
flipped until either a total of n heads or of m tails is 
amassed. Find the expected number of flips. 

Hint: Imagine that one continues to flip even after the 
goal is attained. Let X denote the number of flips needed 
to obtain n heads, and let Y denote the number of 
flips needed to obtain m tails. Note that max(X,Y) + 
min(X,Y) = X + Y. Compute E[max(X, Y)] by con- 
ditioning on the number of heads in the firstn + m — 1 
flips. 


7.27. A deck of n cards numbered / through n, initially in 
any arbitrary order, is shuffled in the following manner: At 
each stage, we randomly choose one of the cards and move 
it to the front of the deck, leaving the relative positions 
of the other cards unchanged. This procedure is continued 


until all but one of the cards has been chosen. At this point, 
it follows by symmetry that all m! possible orderings are 
equally likely. Find the expected number of stages that are 
required. 


7.28. Suppose that a sequence of independent trials in 
which each trial is a success with probability p is per- 
formed until either a success occurs or a total of n trials 
has been reached. Find the mean number of trials that are 
performed. 


Hint: The computations are simplified if you use the 
identity that for a nonnegative integer valued random 
variable X, 


E[X] = )> P(X = i} 
i=1 


7.29. Suppose that X and Y are both Bernoulli random 
variables. Show that X and Y are independent if and only 
if Cov(x, Y) = 0. 


7.30. In the generalized match problem, there are n indi- 
viduals of whom n; wear hat size i, )“/_, nj =n. There are 
also n hats, of which h; are of size i, )*)_, hj = n. If each 
individual randomly chooses a hat (without replacement), 
find the expected number who choose a hat that is their 
size. 


7.31. For random variables _X and Y, show that 
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VVar(X + Y) S /Var(X) + V/Var(Y) 


That is, show that the standard deviation of a sum is 
always less than or equal to the sum of the standard 
deviations. 


7.32. Let R1,...,Rn+m be a random permutation of 
1,...,n + m. (That is, Ry,...,Rn+m is equally likely to be 
any of the (n + m)! permutations of 1,...,1 + m.) For 
a given i = n, let X be the the i smallest of the values 
R,,..., Rn. Show that ELX] =i + may. 

Hint: Note that if we let J,4; equal 1 if Rp, < X and let 
it equal 0 otherwise, that 


m 
X=i+ Do Inge 
k=1 


7.33. Suppose that Y is uniformly distributed over (0, 1), 
and that the conditional distribution of X, given that Y = 
y, is uniform over (0, y). 

(a) Find ELX]. 

(b) Find CovcX, Y). 

(c) Find Var(X). 

(d) Find P{X = x}. 

(e) Find the probability density function of X. 
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8.1 Introduction 


The most important theoretical results in probability theory are limit theorems. Of 
these, the most important are those classified either under the heading laws of large 
numbers or under the heading central limit theorems. Usually, theorems are consid- 
ered to be laws of large numbers if they are concerned with stating conditions under 
which the average of a sequence of random variables converges (in some sense) to 
the expected average. By contrast, central limit theorems are concerned with deter- 
mining conditions under which the sum of a large number of random variables has a 
probability distribution that is approximately normal. 


8.2 Chebyshev’s Inequality and the Weak Law of Large Numbers 


We start this section by proving a result known as Markov’s inequality. 


Proposition Markov’s inequality 


ae If X is a random variable that takes only nonnegative values, then for any value 
a> 0, 
E|X 
Pix 2a} s FIX] 
a 
Proof Fora > 0, let 
1 1 fxX =a 


~ 10 otherwise 


406 
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and note that, since XY = 0, 


Ts 


Taking expectations of the preceding inequality yields 


which, because E[/] = P{X = a}, proves the result. 


As a corollary, we obtain Proposition 2.2. 


Chebyshev’s inequality 


If X is a random variable with finite mean p and variance o”, then for any value 


k > 0, 
2 


Oo 
EAI pl Ky 


Proof Since (¥ — j)* is a nonnegative random variable, we can apply Markov’s 
inequality (with a = k?) to obtain 


E[(X — y)?] 


PX = pr = R= 7 


(2.1) 


But since (X — yu)? = k? if and only if |X — | = k, Equation (2.1) is equivalent to 


EU(X = py 2 
PIX — pl = = ZL - ee 


and the proof is complete. 


The importance of Markov’s and Chebyshev’s inequalities is that they enable 
us to derive bounds on probabilities when only the mean or both the mean and the 
variance of the probability distribution are known. Of course, if the actual distribu- 
tion were known, then the desired probabilities could be computed exactly and we 
would not need to resort to bounds. 


Suppose that it is known that the number of items produced in a factory during a 
week is a random variable with mean 50. 
(a) What can be said about the probability that this week’s production will 
exceed 75? 
(b) If the variance of a week’s production is known to equal 25, then what can 
be said about the probability that this week’s production will be between 40 
and 60? 


Solution Let X be the number of items that will be produced in a week. 


(a) By Markov’s inequality, 


E[X] 50 2 
P(X > 75} S$ === D=5 
pe gg oy 
(b) By Chebyshev’s inequality, 
o? 1 
P{\X — 50| = 10} s — =- 
( |210) = 75=; 
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Proposition 
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Hence, i 4 
P{|X — 50 10} =1—-—-=- 
( | < 10} 77% 
so the probability that this week’s production will be between 40 and 60 is at 
least .75. | 


As Chebyshev’s inequality is valid for all distributions of the random variable 
X, we cannot expect the bound on the probability to be very close to the actual 
probability in most cases. For instance, consider Example 2b. 


If X is uniformly distributed over the interval (0, 10), then, since E[X] = 5 and 
Var(X) = 8 it follows from Chebyshev’s inequality that 
25 


P{|X —5 4) = —— ®& 52 
(IX - 51> 4) = 75 


whereas the exact result is 
P{|X — 5| > 44 =.20 


Thus, although Chebyshev’s inequality is correct, the upper bound that it provides is 
not particularly close to the actual probability. 

Similarly, if X is a normal random variable with mean jw and variance o, 
Chebyshev’s inequality states that 


1 
P{X — p| > 20} S i 
whereas the actual probability is given by 
A= fh 
P{X — p| > 2o}=P > 27 =2[1 — O(2)] > .0456 O 


Chebyshev’s inequality is often used as a theoretical tool in proving results. This 
use is illustrated first by Proposition 2.3 and then, most importantly, by the weak law 
of large numbers. 


If Var(X) = 0, then 
P{X = E[X]}}=1 


In other words, the only random variables having variances equal to 0 are those that 
are constant with probability 1. 


Proof By Chebyshev’s inequality, we have, for any n = 1, 
1 
P {ix — p| > -} = 
n 
Letting noo and using the continuity property of probability yields 


: 1 . 1 
0 nef n> 2a i fr > 3] 


= P{X # wu} 


and the result is established. 


Theorem 
2.1 
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The weak law of large numbers 


Let X1, X2,... be a sequence of independent and identically distributed random vari- 
ables, each having finite mean E[X;j] = . Then, for any e > 0, 


p {une t 
n 


“| =<} 0 as nN—->oo 


Proof We shall prove the theorem only under the additional assumption that the 
random variables have a finite variance 02. Now, since 


xX ite ot, X, xX aw Oe OX, 2 
g(a) an and Var ( en = *) = = 
n n 


it follows from Chebyshev’s inequality that 
2 
P| + ee oe A lade S 

n 


and the result is proven. 


The weak law of large numbers was originally proven by James Bernoulli for the 
special case where the Xj are 0, 1 (that is, Bernoulli) random variables. His statement 
and proof of this theorem were presented in his book Ars Conjectandi, which was 
published in 1713, eight years after his death, by his nephew Nicholas Bernoulli. Note 
that because Chebyshev’s inequality was not known in Bernoulli’s time, Bernoulli 
had to resort to a quite ingenious proof to establish the result. The general form of 
the weak law of large numbers presented in Theorem 2.1 was proved by the Russian 
mathematician Khintchine. 


8.3. The Central Limit Theorem 


Theorem 
3.1 


The central limit theorem is one of the most remarkable results in probability theory. 

Loosely put, it states that the sum of a large number of independent random vari- 

ables has a distribution that is approximately normal. Hence, it not only provides 

a simple method for computing approximate probabilities for sums of independent 

random variables, but also helps explain the remarkable fact that the empirical fre- 

quencies of so many natural populations exhibit bell-shaped (that is, normal) curves. 
In its simplest form, the central limit theorem is as follows. 


The central limit theorem 


Let X,, X2,... be asequence of independent and identically distributed random vari- 
ables, each having mean py and variance o*. Then the distribution of 


Aye eee op a = EL 
o./n 


tends to the standard normal as n—>oo. That is, for —oo < a < ~, 


Xy te) + Xn — ny 1 [ 
=a e 
o/n af Doe dl 
The key to the proof of the central limit theorem is the following lemma, which 
we state without proof. 


—x*/2 


P dx as n—-oco 
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Lemma 
3.1 


Let Z1, Z2,... be a sequence of random variables having distribution functions F7z,, 
and moment generating functions Mz,,n = 1, and let Z be a random variable having 
distribution function Fz and moment generating function Mz. If Mz, ()—- 
M(t) for all ¢, then Fz, (t) > Fz(d for all t at which Fz (ft) is continuous. 


If we let Z be a standard normal random variable, then, since Mz7(t) = ef 5 


it follows from Lemma 3.1 that if Mz, (1) > ef 2 asns oo, then Fz, (t) > P(t) as 
n—> oo. 
We are now ready to prove the central limit theorem. 


Proof of the Central Limit Theorem: Let us assume at first that p = 0 and o? = 1. 
We shall prove the theorem under the assumption that the moment generating func- 
tion of the X;, M(t), exists and is finite. Now, the moment generating function of 


X;j/./n is given by 
tX; t 
E | exp {=| =M (=) 


n nt 
Thus, the moment generating function of )* X;/,/n is given by lM («)| . Let 
i=1 


Li) = log M(H) 
and note that 
L(0) =0 
M'(0) 
M(0) 
=p 
=0 
M(0)M"(0) — [M’(0)P 
[MO)/? 


L'(0) = 


7 ) = 
= E[X?] 
=1 


Now, to prove the theorem, we must show that [M(t/,/n)]” > ef /2 as n— oo, OF, 
equivalently, that nL(t/./n) — t?/2 as n> oo. To show this, note that 


Lit -L'tt aaee 
lim i) = lim chon by L’H6pital’s rule 
noo nol n—> oo —2n-2 
Et 
2. ji | 
n— 00 Qn-1/2 
—_L"(t —3/2 42 
= os en again by L’H6pital’s rule 
n— oo —2?n- 


is 
ge 
= 
ee 
a 
“—————" 
N] hb 
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Thus, the central limit theorem is proven when » = 0 and o* = 1. The result 
now follows in the general case by considering the standardized random variables 
Xj = (Xi; — m)/o and applying the preceding result, since ELX*] = 0, Var(X;*) = 1. 


Remark Although Theorem 3.1 states only that, for each a, 


Mo = ti 


P 
a./n 


— ‘| >®(a) 


it can, in fact, be shown that the convergence is uniform in a. [We say that f;, (a) > f(@) 
uniformly in aif, for each ¢ > 0, there exists an N such that |f,(a) — f(a)| < «¢ for 
alla whenever n = N.] a 


The first version of the central limit theorem was proven by DeMoivre around 
1733 for the special case where the X; are Bernoulli random variables with p = 5 
The theorem was subsequently extended by Laplace to the case of arbitrary p. (Since 
a binomial random variable may be regarded as the sum of n independent and identi- 
cally distributed Bernoulli random variables, this justifies the normal approximation 
to the binomial that was presented in Section 5.4.1.) Laplace also discovered the 
more general form of the central limit theorem given in Theorem 3.1. His proof, 
however, was not completely rigorous and, in fact, cannot easily be made rigorous. 
A truly rigorous proof of the central limit theorem was first presented by the Russian 
mathematician Liapounoff in the period 1901-1902. 

Figure 8.1 illustrates the central limit theorem by plotting the probability mass 
functions of n independent random variables having a specified mass function when 
(a) n =5, (b) n = 10, (c) n = 25, and (d) n = 100. 


An astronomer is interested in measuring the distance, in light-years, from his obser- 
vatory to a distant star. Although the astronomer has a measuring technique, he 
knows that because of changing atmospheric conditions and normal error, each time 
a measurement is made, it will not yield the exact distance, but merely an estimate. 
As a result, the astronomer plans to make a series of measurements and then use the 
average value of these measurements as his estimated value of the actual distance. 
If the astronomer believes that the values of the measurements are independent 
and identically distributed random variables having a common mean d (the actual 
distance) and a common variance of 4 (light-years), how many measurements need 
he make to be reasonably sure that his estimated distance is accurate to within +.5 
light-year? 


Solution Suppose that the astronomer decides to make n observations. If X4, 


X>,...,X, are the n measurements, then, from the central limit theorem, it fol- 
lows that 


n 
> Xj — nd 
i=1 

2/n 


has approximately a standard normal distribution. Hence, 
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Central Limit Theorem 


Enter the probabilities and the number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 


PO Peas) 


Pl 15 


P2 


P3 


P4 


n= 5| 


ean = 10.75 


Variance = 12.6375 


ru 


an) 9s) 


Therefore, if the astronomer wants, for instance, to be 95 percent certain that his 
estimated value is accurate to within .5 light-year, he should make n* measurements, 
where n* is such that 


20() —1=.95 or (=) =.975 


Thus, from Table 5.1 of Chapter 5, 


al n® 
4 
As n* is not integral valued, he should make 62 observations. 
Note, however, that the preceding analysis has been done under the assumption 
that the normal approximation will be a good approximation when n = 62. Although 


=196 or  n* =(7.84)* = 61.47 
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Central Limit Theorem 


Enter the probabilities and the number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 


PO +25 


Pl pp ee 


P2 


P3 


P4 


10 


ean = 21.5 


Variance = 25.275 


Figure 8.1(b) 


this will usually be the case, in general the question of how large n need be before 
the approximation is “good” depends on the distribution of the X;. If the astronomer 
is concerned about this point and wants to take no chances, he can still solve his 
problem by using Chebyshev’s inequality. Since 


Xj Xj 4 
E —|=d Vi —)j=- 
ae oa a 
i=1 i=1 
Chebyshev’s inequality yields 
=. X; 4 16 


Hence, if he makes n = 16/.05 = 320 observations, he can be 95 percent certain that 
his estimate will be accurate to within .5 light-year. 4 


The number of students who enroll in a psychology course is a Poisson random vari- 
able with mean 100. The professor in charge of the course has decided that if the 
number enrolling is 120 or more, he will teach the course in two separate sections, 
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Central Limit Theorem 


Enter the probabilities and the number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 


PO 4) 


Pl 15 


P2 ee 


P3 +2 


P4 23 


25| 


ean = 53.75 


Variance = 63.1875 


Figure 8.1(c) 


whereas if fewer than 120 students enroll, he will teach all of the students together 
in a single section. What is the probability that the professor will have to teach two 
sections? 


Solution The exact solution 


does not readily yield a numerical answer. However, by recalling that a Poisson 
random variable with mean 100 is the sum of 100 independent Poisson random vari- 
ables, each with mean 1, we can make use of the central limit theorem to obtain an 
approximate solution. If X denotes the number of students who enroll in the course, 
we have 


P{X = 120} = P{X = 119.5} (the continuity correction) 


X — 100 _ 119.5 — 100 
J/100 ~—-/100 
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Central Limit Theorem 


Enter the probabilities and the number of random 
variables to be summed. The output gives the mass 
function of the sum along with its mean and 
variance. 


PO +25 


Pl pees. 


P2 uk 


P3 +2 


P4 ee) 


100| 


ean = 215. 
Variance = 252.75 


Figure 8.1(d) 


1 — (1.95) 


ru 


0256 


ru 


where we have used the fact that the variance of a Poisson random variable is equal 
to its mean. a 


If 10 fair dice are rolled, find the approximate probability that the sum obtained is 
between 30 and 40, inclusive. 


Solution Let X; denote the value of the ith die, i= 1,2,...,10. Since 


35 


7 
E(X)) = 5, Var(Xi) = E[X?] — (E[X)? = =. 


the central limit theorem yields 
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295 —35 _X —35 _ 40.5 — 35 
P{29.5 <= X = 40.5}=P < < 


350 350 / 350 
ces 12 12 
= 2@(1.0184) — 1 
= .692 | 
Example Let Xj,i = 1,...,10, be independent random variables, each uniformly distributed 


10 
= over (0, 1). Calculate an approximation to P } > X; > . 


i=1 


Solution Since ELX;j] = 5 and Var(X;) = > we have, by the central limit theorem, 


10 
\ > Xi =» § 


10 6-5 
PL) X, > 6b =P} — > 
rl 1 1 
js (5) fc (=) 
= 1 — (v1.2) 
= .1367 
10 
Hence, )° Xj will be greater than 6 only 14 percent of the time. O 


i=1 


Example An instructor has 50 exams that will be graded in sequence. The times required to 
3e grade the 50 exams are independent, with a common distribution that has mean 
20 minutes and standard deviation 4 minutes. Approximate the probability that the 

instructor will grade at least 25 of the exams in the first 450 minutes of work. 


Solution If we let X; be the time that it takes to grade exam /, then 


25 
X=) X; 
i=1 


is the time it takes to grade the first 25 exams. Because the instructor will grade at 
least 25 exams in the first 450 minutes of work if the time it takes to grade the first 25 
exams is less than or equal to 450, we see that the desired probability is P{X =< 450}. 
To approximate this probability, we use the central limit theorem. Now, 


25 
E[X] = )> E[X)] = 25(20) = 500 
i=1 
and 
25 
Var(X) = » Var(X;) = 25(16) = 400 
i=1 
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Consequently, with Z being a standard normal random variable, we have 


X — 500 — 450 — 500 
P{X < 450} = P ee | 


V400 ~—- 400 
= P{Z < —2.5} 
Sa Pi7 225) 
=1— (2.5) © .006 = 


Central limit theorems also exist when the X; are independent, but not neces- 
sarily identically distributed random variables. One version, by no means the most 
general, is as follows. 


Theorem Central limit theorem for independent random variables 


3.2 : i ‘ i 
Let X1,X2,... be a sequence of independent random variables having respective 


means and variances uj = E[Xj], 0? = Var(Xj). If (a) the Xj; are uniformly 


[o) 
bounded — that is, if for some M, P{|X;i| < M}=1 for all i, and (b) }° a? =oo—then 
i=1 


Yi — ws) 
i=1 


Pi! =a}—-®(a) as n>o 


n 
2 
2 oi 
i=1 


Historical note 


Pierre-Simon, Marquis de Laplace (1749-1827) 

The central limit theorem was originally stated and proven by the French math- 
ematician Pierre-Simon, Marquis de Laplace, who came to the theorem from 
his observations that errors of measurement (which can usually be regarded 
as being the sum of a large number of tiny forces) tend to be normally dis- 
tributed. Laplace, who was also a famous astronomer (and indeed was called 
“the Newton of France’), was one of the great early contributors to both prob- 
ability and statistics. Laplace was also a popularizer of the uses of probability 
in everyday life. He strongly believed in its importance, as is indicated by the 
following quotations taken from his published book Analytical Theory of Prob- 
ability: “We see that the theory of probability is at bottom only common sense 
reduced to calculation; it makes us appreciate with exactitude what reasonable 
minds feel by a sort of instinct, often without being able to account for it.... It is 
remarkable that this science, which originated in the consideration of games of 
chance, should become the most important object of human knowledge.... The 
most important questions of life are, for the most part, really only problems of 
probability.” 

The application of the central limit theorem to show that measurement 
errors are approximately normally distributed is regarded as an important con- 
tribution to science. Indeed, in the seventeenth and eighteenth centuries, the 
central limit theorem was often called the law of frequency of errors. Listen to 
the words of Francis Galton (taken from his book Natural Inheritance, published 
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in 1889): “I know of scarcely anything so apt to impress the imagination as the 
wonderful form of cosmic order expressed by the ‘Law of Frequency of Error.’ 
The Law would have been personified by the Greeks and deified, if they had 
known of it. It reigns with serenity and in complete self-effacement amidst the 
wildest confusion. The huger the mob and the greater the apparent anarchy, the 
mote perfect is its sway. It is the supreme law of unreason.” 


8.4 The Strong Law of Large Numbers 


Theorem 
4.1 


The strong law of large numbers is probably the best-known result in probability 
theory. It states that the average of a sequence of independent random variables 
having a common distribution will, with probability 1, converge to the mean of that 
distribution. 


The strong law of large numbers 


Let X,, X2,... be asequence of independent and identically distributed random vari- 
ables, each having a finite mean yw = E[X;]. Then, with probability 1, 


X, + X2 + -++ + Xn + 


> LL as n> oo 


n 

As an application of the strong law of large numbers, suppose that a sequence 
of independent trials of some experiment is performed. Let F be a fixed event of the 
experiment, and denote by P(E) the probability that F occurs on any particular trial. 
Letting 


1 if £ occurs on the ith trial 


= 0 if £ does not occur on the ith trial 


we have, by the strong law of large numbers, that with probability 1, 


X, +--+ + Xn 
n 


> ELX] = P(E) (4.1) 


Since X; + --- + X;, represents the number of times that the event F occurs in the 
first n trials, we may interpret Equation (4.1) as stating that with probability 1, the 
limiting proportion of time that the event EF occurs is just P(E). 

Although the theorem can be proven without this assumption, our proof of the 
strong law of large numbers will assume that the random variables X; have a finite 
fourth moment. That is, we will suppose that ELX}] = K < co. 


Proof of the Strong Law of Large Numbers: To begin, assume that jz, the mean 
n 


of the Xj, is equal to 0. Let S,, = >> X; and consider 
i=1 


Eis) SF, ee OR ce 
X (Ky + + Xn +--+ + Xn) 


That is, the strong law of large numbers states that 


PC lim (Ky os + Xan =n) =1 
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Expanding the right side of the preceding equation results in terms of the form 
1 


X}, X}X;, XPX?, XPXjX~, and =X :X,X,X/ 


where i, j, k, and / are all different. Because all the X; have mean 0, it follows by 
independence that 


E[X?Xj] = E[X?]E[Xj] = 0 
E[X?X;)X4] = ELX7JELX|JELX] = 0 
E[X}X;X,X)] = 0 


Now, for a given pair i and j, there will be : = 6 terms in the expansion that will 


equal X 2X 2 Hence, upon expanding the preceding product and taking expectations 
term by term, it follows that 


E[S*] = nE[X?] + 6 ¢ E[X}X7] 
=nK + 3n(n — 1)E[X?]ELX?] 


where we have once again made use of the independence assumption. Now, since 
0 = Var(X;) = E[X;'] — (ELX7])° 


we have 
(E[X7))” = E[X#] = K 


Therefore, from the preceding, we obtain 
E[{S*] = nK + 3n(n — 1)K 


which implies that 


Therefore, 


CO of ie) 4 
E se -y2[5 < oo 


n=1 n=1 


lee) 
But the preceding implies that with probability 1, )> S4/n* < oo. (For if there is a 


n= 
positive probability that the sum is infinite, then its expected value is infinite.) But 
the convergence of a series implies that its nth term goes to 0; so we can conclude 
that with probability 1, ‘ 

. Ss 

lim + =0 

n— oo nt 

But if Soin" = (Sn/ny* goes to 0, then so must S;,/n; hence, we have proven that with 
probability 1, 


——>0 as n—-> oo 
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When yp, the mean of the Xj, is not equal to 0, we can apply the preceding argu- 
ment to the random variables X; — jz to obtain that with probability 1, 


or, equivalently, 


which proves the result. 

Figure 8.2 illustrates the strong law by giving the results of a simulation of n inde- 
pendent random variables having a specified probability mass function. The averages 
of the 7 variables are given when (a) 1 = 100, (b) m = 1000, and (c) n = 10,000. 

Many students are initially confused about the difference between the weak and 
the strong laws of large numbers. The weak law of large numbers states that for any 
specified large value n*, (X, + --- + Xy*)/n* is likely to be near w. However, it does 
not say that (X; + --- + X,)/nis bound to stay near yx for all values of n larger than 
n*. Thus, it leaves open the possibility that large values of |(X, + --- + Xn)/n — w| 
can occur infinitely often (though at infrequent intervals). The strong law shows that 
this cannot occur. In particular, it implies that, with probability 1, for any positive 
value «, 


= Strong Law of Large Numbers Mw 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 


of times each outcome occurs, and the average 
of all outcomes. 


Theoretical Mean = 2.05 


Sample Mean = 1.89 


— 22. 
0 il 2 3 


1 20 30 31 


Figure 8.2(a) 
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=] Strong Law of Large Numbers ¥: 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 
of times each outcome occurs, and the average 

of all outcomes. 


Theoretical Mean = 2.05 


Sample Mean = 2.078 


Xj 
yey 

n 

1 
will be greater than ¢ only a finite number of times. 

The strong law of large numbers was originally proven, in the special case of 
Bernoulli random variables, by the French mathematician Borel. The general form of 
the strong law presented in Theorem 4.1 was proven by the Russian mathematician 
A.N. Kolmogorov. 


8.5 Other Inequalities and a Poisson Limit Result 


We are sometimes confronted with situations in which we are interested in obtain- 
ing an upper bound for a probability of the form P{X — ju = a}, where a is some 
positive value and when only the mean yp = E[X] and variance o* = Var(X) of the 
distribution of X are known. Of course, since X — 4 =a > Oimplies that |X — u| =a, 
it follows from Chebyshev’s inequality that 


2 
P(X — w=a)< PIX — ul=ap<= 2 when a>0 
a 
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Strong Law of Large Numbers 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 
of times each outcome occurs, and the average 

of all outcomes. 


Theoretical Mean = 2.05 
Sample Mean = 2.0416 


—— 2 2. 
0 1 2 3 


4 


1041 2027 2917 3505 


Figure 8.2(c) 


However, as the following proposition shows, it turns out that we can do better. 


Proposition One-sided Chebyshev inequality 


5.1 : F : . ; 
If X is a random variable with mean 0 and finite variance o, then, for anya > 0, 


2 
Pix za} as stam 
Proof Let b > O and note that 
X =a isequivalentto X +b=a+b 
Hence, 
Pix =aj=P{X + b=2a+4 5b} 


<= P{(X + by = (a + b)’} 


where the inequality is obtained by noting that sincea + b >0,X +b=a+b 
implies that (Y + b)* = (a + b)*. Upon applying Markov’s inequality, the 


preceding yields that 
E[(X + b)? 2 +4 bP 
Pix 2a} s Ua me a 
(a + by? (a + b)? 
Letting b = o*/a [which is easily seen to be the value of b that minimizes 


(o? + b*)/(a + b)?] gives the desired result. 
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If the number of items produced in a factory during a week is a random variable 
with mean 100 and variance 400, compute an upper bound on the probability that 
this week’s production will be at least 120. 


Solution It follows from the one-sided Chebyshev inequality that 


400 1 


P{X = 120} = P{X — 100 = 20} < = 
a } 400 + (20)2 2 


Hence, the probability that this week’s production will be 120 or more is at most 5 
If we attempted to obtain a bound by applying Markov’s inequality, then we 
would have obtained 


E(x) 5 
PLX = 120) < 2) 3 
120 6 
which is a far weaker bound than the preceding one. a 


Suppose now that X has mean yw and variance o7. Since both X — wand py — X 
have mean 0 and variance o?, it follows from the one-sided Chebyshev inequality 
that, fora > 0, 


o2 


A i Ges ae 


and 


o2 


Plu - X= —— 
{wu a} Par 


Thus, we have the following corollary. 
If E[X] = w and Var(X) = o?, then, for a > 0, 


2 
PX =pu+a}s— 
oF + a 


o2 


PIX = < 
{ bh — a} 4 e 


A set of 200 people consisting of 100 men and 100 women is randomly divided into 
100 pairs of 2 each. Give an upper bound to the probability that at most 30 of these 
pairs will consist of a man and a woman. 


Solution Number the men arbitrarily from 1 to 100, and for i = 1,2,... 100, let 


1 if man iis paired with a woman 


Xi, = : 
: 0 otherwise 


Then _X, the number of man—woman pairs, can be expressed as 


100 


X= > X; 
i=1 


Because man i is equally likely to be paired with any of the other 199 people, of 
which 100 are women, we have 
100 


E[Xi] = P(X; = 1) = 355 
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Similarly, for i # j, 


E[X;Xj] = P(X; = 1, Xj; = 1} 

100 99 

199 197 

where P(X; = 1|X; = 1} = 99/197, since, given that man i is paired with a woman, 
man jis equally likely to be paired with any of the remaining 197 people, of which 99 
are women. Hence, we obtain 


= Pix TP ji G— 1} = 


100 


E[X] = )0 E[X] 
i=1 


100 
= (100) 
(100) 799 
= 50.25 
100 
Var(X) = )° Var(Xi) + 29° ¥° Cov(Xi, Xj) 
i=1 i<j 
100 99 100 \ | 100 99 100\? 
=i 29 
199 199 * ( 2 ) Bs 197 (Ss) 
= 25.126 


The Chebyshev inequality then yields 


25.126 _ 
(20.25)2 


P{X = 30} s P{|X — 50.25] = 20.25} s 


Thus, there are fewer than 6 chances in 100 that fewer than 30 men will be paired with 
women. However, we can improve on this bound by using the one-sided Chebyshev 
inequality, which yields 


P{X < 30} = P{X < 50.25 — 20.25} 
25.126 


~ 25.126 + (20.25)2 
~ 058 = 


When the moment generating function of the random variable X is known, we 
can obtain even more effective bounds on P{X = a}. Let 


M(t) = Efe*] 
be the moment generating function of the random variable X. Then, fort > 0, 


P{X = a} = P{e* = e} 
< Efe]e- by Markov’s inequality 


Similarly, for t < 0, 


P{X < a} = P{e* = e} 
< E[e* ]e~@ 


Thus, we have the following inequalities, known as Chernoff bounds. 


Proposition 
5.2 
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Chernoff bounds 


P(X =a} se“M(t) forall t > 0 

P(X <a} =e“M(t) forall t <0 
Since the Chernoff bounds hold for all ¢ in either the positive or negative quadrant, 
we obtain the best bound on P{X = a} by using the ¢ that minimizes e~“M (0). 
Chernoff bounds for the standard normal random variable 


If Z is a standard normal random variable, then its moment generating function is 
M(t) = ef /2, so the Chernoff bound on P{Z = a} is given by 


P{IZ =a) see? forall t>0 


(2 /2—ta 


Now the value of t,t > 0, that minimizes e is the value that minimizes r2 /2 — ta, 


which is t = a. Thus, for a > 0, we have 
P{iZ=a\<s #2 
Similarly, we can show that, fora < 0, 


P{Z <a) <e%? = 


Chernoff bounds for the Poisson random variable 


If X is a Poisson random variable with parameter 4, then its moment generating 
function is M(t) = e*-), Hence, the Chernoff bound on P{X = i} is 


PIX = ip <b De-t# ¢ 30 


Minimizing the right side of the preceding inequality is equivalent to minimizing 
A(e! — 1) — it, and calculus shows that the minimal value occurs when e! = i/A. 
Provided that i/A > 1, this minimizing value of t will be positive. Therefore, assuming 
that i > A and letting e’ = i/A in the Chernoff bound yields 


i 
PxXege ee (2) 
L 


or, equivalently, 


e* (er) 


LE 


Pix =} 


Consider a gambler who is equally likely to either win or lose 1 unit on every play, 
independently of his past results. That is, if X; is the gambler’s winnings on the ith 
play, then the X; are independent and 


P(X; =1} = P{X; =-1} = ; 
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n 
Let S$, = >> X; denote the gambler’s winnings after n plays. We will use the Chernoff 


i=1 
bound on P{S,, = a}. To start, note that the moment generating function of X; is 


t —t 
+e 
E tX = e€ 
[ee] = 
Now, using the McLaurin expansions of e’ and e~‘, we see that 
: : eo OF re 
e+e a ee ee ae aor ee 
ry. 
=2 lege ge| 
oo pan 
= 
De (2n)! 
n=0 
00 42 /)N 
/2 
= 2 ( a since (2n)! = n!2” 
n=0 
= 2¢/? 


Therefore, 
E{e*] > ele 


Since the moment generating function of the sum of independent random variables 
is the product of their moment generating functions, we have 


Efe] = (E[e*])" 
< ent /2 
Using the preceding result along with the Chernoff bound gives 
P{S, 2 a} = een /2 4 0 


The value of ¢ that minimizes the right side of the preceding is the value that min- 
imizes nt?/2 — ta, and this value is t = a/n. Supposing that a > 0 (so that the 
minimizing ¢ is positive) and letting t = a/n in the preceding inequality yields 


P{S, 2a} s et ln gs 
This latter inequality yields, for example, 
P{Sio = 6} = e776/9 = 1653 
whereas the exact probability is 
P{Si9 = 6} = P{gambler wins at least 8 of the first 10 games} 
10 de 10 4 10 
8 9 10 56 
= = = .0547 a 


210 1024 
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The next inequality is one having to do with expectations rather than probabili- 
ties. Before stating it, we need the following definition. 


Definition 
A twice-differentiable real-valued function f(x) is said to be convex if f’ (x) = 0 
for all x; similarly, it is said to be concave if f(x) = 0. 


Some examples of convex functions are f(x) = x, f(x) = e™, and f(x) = —x!/" 


for x = 0. If f(x) is convex, then g(x) = —f(x) is concave, and vice versa. 
Jensen’s inequality 
If f(x) is a convex function, then 
EL f(X)] = fE[X)) 
provided that the expectations exist and are finite. 
Proof Expanding f(x) in a Taylor’s series expansion about x = E[X] yields 


f"E@ — nw)? 
2 


where & is some value between x and yw. Since f’”(&) = 0, we obtain 


f@=fw+fWweo - pw) 


fo=fw + f(we@ - w) + 


Hence, 
fY=fw + fw — pw) 


Taking expectations yields 


EL f(X)] = fe) + fWweELX - u)] =f) 


and the inequality is established. 


An investor is faced with the following choices: Either she can invest all of her money 
in a risky proposition that would lead to a random return X that has mean m, or 
she can put the money into a risk-free venture that will lead to a return of m with 
probability 1. Suppose that her decision will be made on the basis of maximizing 
the expected value of u(R), where R is her return and uw is her utility function. By 
Jensen’s inequality, it follows that if u is a concave function, then E[u(X)] S u(m), so 
the risk-free alternative is preferable, whereas if u is convex, then E[u(X)] = u(m), 
so the risky investment alternative would be preferred. i 


The following proposition, which implies that the covariance of two increasing 
functions of a random variable is nonnegative, is quite useful. 


If f and g are increasing functions then 


Elf(X)g(X)] = EFOJELS(X)] 


Proof To prove the preceding inequality suppose that X and Y are independent with 
the same distribution and that f and g are both increasing functions. Then, because f 
and g are increasing, f(X) — f(Y) and g(X) — g(Y) will both be positive when X > Y 
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and will both be negative when XY < Y. Consequently, their product is positive. That 
is, 


(F(X) — f(Y)) (g(X) — g(Y)) = 0 
Taking expectations gives 
El(f(X) — f(Y)) (g(x) — g(¥))] = 0 
Multiplying through and taking expectations term by term yields 
E[f(X)g(X)] — ElfAOsY)] — EfW)gO] + EF MgV%] =0 (5.1) 
Now, 


E[fOg(Y)] = ELPCOJE[g(Y)] by independence of X and Y 
= Elf(X)]E[g(X)] because X and Y have the same distribution 


Similarly, EIf(Y)g(X)] = EU MIElg2O] = ELFCOJELgX)], and ELf(Y)g(Y)] = 
Elf (X)g(X)]. Hence, from Equation (5.1) we obtain that 


2E[f(X)g(X)] — 2E[f(X Es] = 0 


which proves the result. 


Suppose there are m days in a year, and that each person is independently born 
on day r with probability p,,r = 1,...,m, )-/", py = 1. Let Aj; be the event that 
persons i and j are born on the same day. In Example 5c of Chapter 4, we showed that 
the information that persons 1 and 2 have the same birthday makes it more likely 
that persons 1 and 3 have the same birthday. After proving this result, we argued 
that it was intuitive because if “popular days” are the ones whose probabilities are 
relatively large, then knowing that 1 and 2 share the same birthday makes it more 
likely (than when we have no information) that the birthday of person 1 is a popular 
day and that makes it more likely that person 3 will have the same birthday as does 
1. To give credence to this intuition, suppose that the days are renumbered so that p, 
is an increasing function of r. That is, renumber the days so that day 1 is the day with 
lowest birthday probability, day 2 is the day with second lowest birthday probability, 
and so on. Letting X be the birthday of person 1, then because the higher numbered 
days are the most popular our intuitive explanation would lead us to believe that 
the expected value of X should increase upon the information that 1 and 2 have the 
same birthday. That is, it should be that ELX|Aj2] = E[X]. To verify this, let Y be 
the birthday of person 2, and note that 


PX => r, A412) 
P(A,,2) 

PX =r,Y =r) 
~y PX=rnVY=n 
RF 

DP? 


P(X =17|Aj12) = 


Hence, 


2 

r 
ELX|Ai2] = > r P(X = rlA2) = = Pr 
r rPy 
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Because ELX] = )°,.rP(X =r) = >¢,.rp;, we need show that 


dorPr = | dorpr| | doer 
r F F 
But 
E[Xpx] = )orprP(X =n = Sore}, Elpxl=ov?, FLX] = orpr 
r r a 


and thus we must show that 

E|Xpx] = Elpx EX] 
which follows from Proposition 5.4 because both f(X) = X and g(X) = py are 
increasing functions of X. | 


When f(x) is an increasing and g(x) is a decreasing function, then it is a simple con- 
sequence of Proposition 5.4 that 


Elf(X)g(X)] = EFCOJE[S(X)] 
We leave the verification of the preceding as an exercise. 
Our final example of this section deals with a Poisson limit result. 


A Poisson Limit Result 


Consider a sequence of independent trials, with each trial being a success with proba- 
bility p. If we let Y be the number of trials until there have been a total of r successes, 
then Y is a negative binomial random variable with 


qa 
E[Y] = . Var(Y) = ace 
P P 
Thus, when p = ew we have that 
x Xr 
Avi=f4n, Vary 
Pi 


Now, when + is large, Var(Y) ~ i. Thus, as r becomes larger, the mean of Y grows 
proportionately with r, while the variance converges to A. Hence, we might expect 
that when r is large Y will be close to its mean value of r + 4. Now, if we let X 
be the number of failures that result in those Y trials - that is, XY is the number of 
failures before there have been a total of r successes - then when r is large, because 
Y is approximately r + 4, it would seem that X would approximately have the dis- 
tribution of the number of failures in r + 4 independent trials when each trial is a 
failure with probability 1 — p = a But by the Poisson limit of the binomial, such a 
random variable should approximately be Poisson with mean (r + Dts = i. That 
is, aS r—>co, we might expect that the distribution of X converges to that of a Poisson 
random variable with mean 4. We now show that this is indeed true. 
Because X = kif the r” success occurs on trial r + k, we see that 


P(X =h=PVY=r+h 
k-1 
= (" , era apy 
r-—1 
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When p= 54> 


(rey a - pt 
r—1 


r+k-—-1 ( Xr 
k r+iA 
Pek = DORK = 2)eeer 
7 k! 
Mrtk=19rtk =2 
k! r+a r+i 
ak 
> — 
k! 


i 


yk 
(r + Ak 
r 
r+i2 


as r—->Co 


Also, 


Xr 
y= + —)’>e as roo 
r 


Thus, we see that 
k 


Xr 
P(X =k)>e 77 


as r—->Co 


8.6 Bounding the Error Probability When Approximating a Sum of 


Independent 
Variable 


Bernoulli Random Variables by a Poisson Random 


In this section, we establish bounds on how closely a sum of independent Bernoulli 
random variables is approximated by a Poisson random variable with the same mean. 
Suppose that we want to approximate the sum of independent Bernoulli random 
variables with respective means p1,p2,...,Pn. Starting with a sequence Yj,..., Yn 
of independent Poisson random variables, with Y; having mean p;, we will construct 
a sequence of independent Bernoulli random variables X1,...,X, with parameters 
P1,---,Pn Such that 


P(X; # Yi} <p? for eachi 
A n 
Letting X = )° Xj and Y = )° Yi, we will use the preceding inequality to con- 


i=1 i=1 
clude that 


n 
PIX #Y}s= 0p? 
i=l 


Finally, we will show that the preceding inequality implies that for any set of real 
numbers A, 


n 
|P(X € A} — P{YeA}| = > Pp} 
i=1 


Since X is the sum of independent Bernoulli random variables and Y is a Poisson 
random variable, the latter inequality will yield the desired bound. 

To show how the task is accomplished, let Y;,i = 1,..., be independent Pois- 
son random variables with respective means p;. Now let U;,..., U;, be independent 
random variables that are also independent of the Y;’s and are such that 
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U; = 0 with probability (1 — p,)e?i 
'“)1. with probability 1 — (1 — pe 


This definition implicitly makes use of the inequality 


eP=1-p 


in assuming that (1 — pie?’ = 1. 
Next, define the random variables X;,i = 1,...,n, by 


% 2 (0 #HSD 0 
'“~)1 otherwise 


Note that 


P(X; = 0} = PLY; = O}P{U; = 0} =e Pi — pre =1 — pi 
P{LX, = 1=1 — PLX; = 0} =p; 
Now, if X; is equal to 0, then so must Y; equal 0 (by the definition of X;). Therefore, 
PX; # Yi} = P{Xi =1, Yi 4 1} 
= P{Y;=0,X;=1} + P{Y¥; > } 
= P{Y¥;=0,U;=1} + P{Y; > } 
=e Pil = i= piel +1- et = pie? 
=pi — pie” 
<p?  (sincel — e? <p) 


n n 

Now let X¥ = )> X; and Y = )> Yj, and note that X is the sum of independent 
i=1 i=1 

Bernoulli random variables and Y is Poisson with the expected value E[Y] = ELX] = 


n 
>= pi. Note also that the inequality ¥ # Y implies that X; # Yj; for some i, so 
1 


P{X # Y} Ss P{X; # Yj; for some 7} 
n 
< S> PIX; # Yj} (Boole’s inequality) 
i=1 


For any event B, let Ig, the indicator variable for the event B, be defined by 


1 if B occurs 


Ip = : 
a 0 otherwise 


Note that for any set of real numbers A, 


Iixeay — Ttyveay = [xvy} 


The preceding inequality follows from the fact that since an indicator variable is 
either 0 or 1, the left-hand side equals 1 only when J;ye4} = 1 and [yye4) = 0. But 
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this would imply that XY € A and Y ¢ A, which means that X # Y, so the right side 
would also equal 1. Upon taking expectations of the preceding inequality, we obtain 


P{X eA} — P{YeA} = P{X # Y} 
By reversing X and Y, we obtain, in the same manner, 

P{Y eA} — PiX eA} = P{X # Y} 
Thus, we can conclude that 


[P(X ¢ A} — P{Y € A}| < P(X # Y} 


n 
Therefore, we have proven that with A = >> pi, 
i=1 


n en yi n 
Py XE A Ra <)\p; 
i=1 i=1 


icA 


Remark When all the p; are equal to p, X is a binomial random variable. Hence, 
the preceding inequality shows that, for any set of nonnegative integers A, 


. F —np i 
‘> @z pyr x e vp) < np* | 


icA icA 


8.7 The Lorenz Curve 


The Lorenz curve L(p),0 < p < 1isa plot of the fraction of the total income of 
a population that is earned by the 100p percent of the population having the lowest 
incomes. For instance, L(.5) is the fraction of total income earned by the lower half 
of income earners. Suppose that the earnings of the members of a population can be 
represented by the quantities X1, X2,... where the X; are independent and identi- 
cally distributed positive continuous random variables with distribution function F. 
Now, let X be a random variable with distribution F, and define &, to be that value 
such that 


P{X = &} = FE) =p 


The quantity &, is called the 100p percentile of the distribution F. With /(x) defined 
by 


jl, ifx<& 
IO=)0 ifx=s 


it follows that ee is the fraction of the first n members of the population 


that have incomes less than &). Upon letting noo, and applying the strong law 
of large numbers to the independent and identically distributed random variables 
(Xx), k = 1, the preceding yields that, with probability 1, 
(Xx 1 + LY, 
lin (44) + + (Xn) = E[I(X)] = F() =p 


n— oo n 
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That is, with probability 1, p is the fraction of the population whose income is less 
than &,. The fraction of the total income earned by those earning less than &, can be 
obtained by noting that the fraction of the total income of the first n members of the 
population that is from those earning less that &) is au ae = ace (Xn 
yields that 


a Letting no, 


XX) 4+... + Xn (Xn) 
ee _ AIXIX)] 


L(p) = lim | EMES A AX] 


n 


where the final equality was obtained by applying the strong law of large numbers to 
both the numerator and denominator of the preceding fraction. Letting 1 = ELX], 
and noting that 


od) & 
EX 100) = [ x10) f()dx = |” x fox) dx 


shows that 


_ELXIX)) 1 
Le) =a = af fee (71) 


If F is the distribution function of a uniform random variable on (a,b), where 0 = 
a < b,then 


ae | x—a 
(x) erie Ran eS 


Because p = F(&,) = a, we see that &) = a + (b — a)p. Because the mean of a 
uniform (a, b) random variable is (a + b)/2, we obtain from Equation (71) that 


2 a+(b—a)p 
L(y) = / * dx 
a 


a+b b—a 
_@+( - apy - @ 
~ (a + b)(b — a) 

2pa + (b — a)p” 

_ a+b 


When a = 0, the preceding gives that L(p) = p. Also, letting a converge to b gives 
that 


lim L(y) =p 
a—>b 


which can be interpreted as saying that L(p) = p when all members of the population 
earn the same amount. 


A useful formula for L(p) can be obtained by letting 


0, if 
F@=1-10D =}; oon 


and then noting that 


_ ELX] — E[XM(X)) — ELXICO] 
1 MO)= ~~. CL 
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Conditioning on /(X) gives that 


E[LX J(X)] = ELXI(X)|J(X) = 1] PU(X) = 1) + ELXJ(X)|J(X) = O]PU(X) = 0) 
= E[X|X = &](1 —- p) 
which shows that 
E[X|X = &](1 — p) 
E[X] 


1 - Le) = (72) 


If F is the distribution function of an exponential random variable with mean 1, then 
p=F €)=1- e, and so & = —log(1 — p). Because the lack of memory 
property of the exponential implies that ELX|X > &] = & + E[X] = & + 1, 
we obtain from Equation (72) that the fraction of all income that is earned by those 
earning at least &, is 


1 — Lp) = + Dd - p) 


= (1 — logd — p))d — p) 
=1—-—p-— (1 — p)log(l — p) 


giving that 
L(p) =p + (1 — p)log(l — p) a 


If F is the distribution function of a Pareto random variable with parameters 4 > 0, 
a > 0, then F(x) =1 — «x 2 a. Consequently, p = F(&)) = 1 — f, giving that 
p 


xh 

A 
pea" or & =a(l — py 
Pp t= 5 P 


When A > 1, it was shown in Section 5.6.5 that ELX] = A4. In addition, it was shown 


in Example 5f of Chapter 6 that if X is Pareto with parameters A, a then the condi- 
tional distribution of X given that it exceeds xo, x9 > a, is Pareto with parameters i, 


Xo. Consequently, when 4 > 1, ELX|X > &] = ae and thus Equation (72) yields 
that 
E[X|X > &y| (1 — p) &)(1 — p) 1-1 
1 — Lip) = = =(1 ee 
(P) FLX] ; (1 — p) 
or 4 
L(p)=1-d—-p)* a 


We now prove some properties of the function L(p). 


L(p) is an increasing, convex function of p, such that L(p) = p. 


Proof That L(p) increases in p follows from its definition. To prove convexity we 
must show that L(p + a) — L(p) increases in p for p = 1 — a; or equivalently, that 
the proportion of the total income earned by those with incomes between &, and 
& +a increases in p. But this follows because, for all p, the same proportion of the 
population — namely, 100a percent - earns between &, and &,+,, and & increases in 
p. (For instance, 10 percent of the population have incomes in the 40 to 50 percentile 
and 10 percent of the population have incomes in the 45 to 55 percentile, and as 
the incomes earned by the 5 percent of the population in the 40 to 45 percentile 
are all less than those earned by the 5 percent of the population in the 50 to 55 
percentile, it follows that the proportion of the population income of those in the 
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40 to 50 percentile is less than the proportion of those in the 45 to 55 percentile.) 
To establish that L(p) = p, we see from Equation (71) that we need to show that 
E[X I(X)] = E[X]p. But this follows because I(x), equal to 1 if x < & and to 0 
if x = &, is a decreasing and h(x) = x is an increasing function of x, which from 
Proposition 5.4 implies that ELX [(X)] = ELX]E[I(X)] = E[X]p. 


Because L(p) = p with L(p) = p when all members of the population have 
the same income, the area of the “hump” equal to the region between the straight 
line and the Lorenz curve (the shaded region in Figure 8.3), is an indication of the 
inequality of incomes. 


38/60 


L(p) 


P 


Figure 8.3 The Hump of the Lorenz Curve 


A measure of the inequality of the incomes is given by the Gini index, which is 
the ratio of the area of the hump divided by the area under the straight line L(p) = p. 
Because the area of a triangle is one half its base times its height, it follows that the 
Gini index, call it G, is given by 


me ae Jo L@) dp _ 
= TF = 


1 
1-2f L(p) dp 
0 


Find the Gini index when F, the distribution of earnings for an individual in the 
population, is uniform on (0, 1), and when F is exponential with rate A. 


Solution When F is the uniform (0, b) distribution, then as shown in Example 7a, 
Lip) = p’, giving that G = 1 — 2/3 = 1/3. When F is exponential, then from 
Example 7b 


1 1 
i Lipp = | (p + d — pylogdl — p)) dp 


1 1 
= .+ if x log(x) dx 
2 0 
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Integrating by parts with u = log x, dv = x dx shows that 


1 Mss 
/ xlog(x)dx = -| ~dx = —1/4 
0 0 2 


where L’hopital’s rule was used to obtain that limy—+9 x? log(x) = 0. Hence, i L(p) 
dp = 1/4, giving that G = 1/2. Because larger values of G indicate more inequality, 
we see that the inequality is larger when the distribution is exponential than when it 


is uniform. 


Summary 


Two useful probability bounds are provided by the Markov 
and Chebyshev inequalities. The Markov inequality is con- 
cerned with nonnegative random variables and says that 
for X of that type, 


E|X] 


P{X =a} < 


for every positive value a. The Chebyshev inequality, 
which is a simple consequence of the Markov inequality, 
states that if ¥ has mean jz and variance o”, then, for every 
positive k, 
1 
PIX — wp) = ko} S BZ 
The two most important theoretical results in probability 
are the central limit theorem and the strong law of large 
numbers. Both are concerned with a sequence of inde- 
pendent and identically distributed random variables. The 
central limit theorem says that if the random variables 
have a finite mean j and a finite variance o”, then the 


Problems 


8.1. Suppose that X is a random variable with mean and 
variance both equal to 20. What can be said about P{0 < 
X < 40}? 


8.2. From past experience, a professor knows that the test 
score of a student taking her final examination is a random 
variable with mean 75. 


(a) Give an upper bound for the probability that a stu- 
dent’s test score will exceed 85. 

(b) Suppose, in addition, that the professor knows that 
the variance of a student’s test score is equal to 25. What 
can be said about the probability that a student will score 
between 65 and 85? 

(c) How many students would have to take the examina- 
tion to ensure with probability at least .9 that the class 
average would be within 5 of 75? Do not use the central 
limit theorem. 


distribution of the sum of the first n of them is, for large 
n, approximately that of a normal random variable with 
mean ny and variance no”. That is, if X;,i = 1, is the 
sequence, then the central limit theorem states that for 
every real number a, 


lim p {ae ae 2 dare oP a 
n—-> oo a./n Jon —~oo 

The strong law of large numbers requires only that the ran- 
dom variables in the sequence have a finite mean w. It 
states that with probability 1, the average of the first of 
them will converge to jz as n goes to infinity. This implies 
that if A is any specified event of an experiment for which 
independent replications are performed, then the limiting 
proportion of experiments whose outcomes are in A will, 
with probability 1, equal P(A). Therefore, if we accept the 
interpretation that “with probability 1” means “with cer- 
tainty,” we obtain the theoretical justification for the long- 
run relative frequency interpretation of probabilities. 


8.3. Use the central limit theorem to solve part (c) of Prob- 
lem 8.2. 


8.4. Let X1,..., X29 be independent Poisson random vari- 
ables with mean 1. 


(a) Use the Markov inequality to obtain a bound on 
20 
Ply x > 15 
1 
(b) Use the central limit theorem to approximate 


20 
P pee 
1 


8.5. Fifty numbers are rounded off to the nearest inte- 
ger and then summed. If the individual round-off errors 
are uniformly distributed over (—.5,.5), approximate the 
probability that the resultant sum differs from the exact 
sum by more than 3. 


8.6. A die is continually rolled until the total sum of all 
rolls exceeds 300. Approximate the probability that at 
least 80 rolls are necessary. 


8.7. A person has 100 light bulbs whose lifetimes are inde- 
pendent exponentials with mean 5 hours. If the bulbs are 
used one at a time, with a failed bulb being replaced imme- 
diately by a new one, approximate the probability that 
there is still a working bulb after 525 hours. 


8.8. In Problem 8.7, suppose that it takes a random time, 
uniformly distributed over (0, .5), to replace a failed bulb. 
Approximate the probability that all bulbs have failed by 
time 550. 


8.9. Xj is a sequence of random variables, each with mean 
1.3, so that T = “i, X; has a gamma distribution with 
variance 1.69n. It is desired that the values of T lie within 
+.5 of its true mean with probability at least .9. How large 
a value of n should be taken? 


8.10. An elevator has a capacity of carrying 8 people. 
Safety standards permit only one instance of overload per 
1000 trips. Passengers using the lift are assumed to have 
normally distributed weights with mean 80.1 kg and a stan- 
dard deviation of 15.6. Find an approximation to the value 
of this maximum. 


8.11. Many people believe that the daily change of price of 
a company’s stock on the stock market is a random vari- 
able with mean 0 and variance o2. That is, if Yn represents 
the price of the stock on the nth day, then 
n=1 


Yn =Y¥n-1 + Xn 


where X1,X2,... are independent and identically dis- 
tributed random variables with mean 0 and variance o”. 
Suppose that the stock’s price today is 100. If ¢? = 1, what 
can you say about the probability that the stock’s price will 
exceed 105 after 10 days? 


8.12. The performance of a six-member relay athletics 
team is being modeled by a sports research team. The 
slowest runner is put first and the fastest last. Training 
methods used ensure that the time taken by each run- 
ner for running 1/6 of one lap is uniformly distributed. 
Average time for the ith member has been estimated at 
20 — (i — 1))/2 seconds. The absolute minimum that 
applies to all runners is 15 seconds. Estimate an upper 
bound for the probability that the team will cover one 
whole lap in less than two minutes. 
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8.13. Student scores on exams given by a certain instructor 
have mean 74 and standard deviation 14. This instructor is 
about to give two exams, one to a class of size 25 and the 
other to a class of size 64. 


(a) Approximate the probability that the average test 
score in the class of size 25 exceeds 80. 

(b) Repeat part (a) for the class of size 64. 

(c) Approximate the probability that the average test 
score in the larger class exceeds that of the other class by 
more than 2.2 points. 

(d) Approximate the probability that the average test 
score in the smaller class exceeds that of the other class 
by more than 2.2 points. 


8.14. A certain component is critical to the operation of an 
electrical system and must be replaced immediately upon 
failure. If the mean lifetime of this type of component is 
100 hours and its standard deviation is 30 hours, how many 
of these components must be in stock so that the probabil- 
ity that the system is in continual operation for the next 
2000 hours is at least .95? 


8.15. An insurance company has 10,000 automobile pol- 
icyholders. The expected yearly claim per policyholder is 
$240, with a standard deviation of $800. Approximate the 
probability that the total yearly claim exceeds $2.7 million. 


8.16. Apple producers Wenzu and Xandru enroll in a con- 
test. Records show that Wenzu’s apples weigh 94.1 grams 
on average, with a standard deviation of 171, while Xan- 
dru’s apples weigh 90.4 grams, with a standard deviation 
of 21.9. Judges randomly pick samples of sizes 35 and 20 
from Wenzu’s and Xandru’s produces, respectively. The 
winner will be the person whose sample’s average weight 
will exceed 100 grams. If there is a tie, the two contestants 
will share the prize. 


(a) Wenzu claims that the selection process is unfair as the 
relative sample sizes give Xandru’s sample a greater prob- 
ability of exceeding 100 grams. Work out the two proba- 
bilities and compare. 

(b) Work out the probability that Xandru’s sample mean 
exceeds that of Wenzu’s. 


8.17. Redo Example 5b under the assumption that the 
number of man—woman pairs is (approximately) normally 
distributed. Does this seem like a reasonable supposition? 


8.18. Repeat part (a) of Problem 8.2 when it is known 
that the variance of a student’s test score is equal 
to 25. 


8.19. Each player of a soccer team has to shoot a ball at 
a distant target until three successive hits are registered. 
Let p denote the probability of a player hitting the tar- 
get. Obtain an upper bound for the probability of a player 
needing more than 10 attempts to successfully hit the tar- 
get thrice. Let p be estimated at .8 for skillful players. 
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These players claim that they will not exceed 10 attempts 
more than 10% of the time. Is their claim in line with the 
bound obtained? 


8.20. A random variable X takes values within the interval 
[0, K] with ELX] = 100. Show that 


(a) E[X“] = K“ for alla = 0; 

(b) ELX>] = 10; 

(c)0 = E [cos (3 ~ ~)| > cos (F + 1); 
(d) E[1 — log(l + X)] = 1 — log 101. 


8.21. Let X be a positive random variable. Show that 
(a) log E[e*] = e#llos 1, 
(b) E[log VX] = } log E[X]. 


8.22. Would the results of Example 5f change if the 
investor were allowed to divide her money and invest the 
fraction a,0 < a < 1, in the risky proposition and invest 
the remainder in the risk-free venture? Her return for such 
a split investment would be R= aX + (1 — a)m. 


8.23. Let X be a Poisson random variable with mean 20. 


(a) Use the Markov inequality to obtain an upper 
bound on 
p= P{X = 26} 


(b) Use the one-sided Chebyshev inequality to obtain an 
upper bound on p. 

(c) Use the Chernoff bound to obtain an upper bound 
on p. 

(d) Approximate p by making use of the central limit the- 
orem. 

(e) Determine p by running an appropriate program. 


8.24. If X is a Poisson random variable with mean 100, 
then P{X > 120} is approximately 


Theoretical Exercises 


8.1. If X has variance o”, then o, the positive square root 
of the variance, is called the standard deviation. If X has 
mean w and standard deviation o, show that 
1 
PIX — p| = ko} Ss 2 
8.2. If X has mean yw and standard deviation o, the ratio 
r = |u|/o is called the measurement signal-to-noise ratio 
of X. The idea is that X can be expressed as X¥ = w+ 
(X — ww), with uw representing the signal and X — yw the 
noise. If we define |(Y — 2)/u| = D as the relative devia- 
tion of X from its signal (or mean) 4, show that fora > 0, 


P{iD =a} 2=1- 


(a) .02, 
(b) .5 or 
(c) .3? 


8.25. Suppose that the distribution of earnings of members 
of a population is Pareto with parameters A, a > 0, where 


— log) — 
A= ear = 1.161. 


(a) Show that the top 20 percent of earners earn 80 percent 
of the total earnings. 

(b) Show that the top 20 percent of the top 20 percent of 
earners earn 80 percent of the earnings of the top 20 per- 
cent of earners. (That is, show that the top 4 percent of all 
earners earn 80 percent of the total earnings of the top 20 
percent of all earners.) 


8.26. Let X be a positive random variable. Show that 
E[VXe-*] = /E[X]E[e~*]. 


8.27. If L(p) is the Lorenz curve associated with the ran- 


dom variable X, show that L(p) = eel 


8.28. Suppose that L(p) is the Lorenz curve associated 
with the random variable X and that c > 0. 


(a) Find the Lorenz curve associated with the random vari- 
able cX. 


(b) Show that L,(p), the Lorenz curve associated with the 
random variable X + c, is 


L(p)E[X] + pe 
E[X] + ¢ 


L-(p) = 


(c) Verify that the answer to part (b) is in accordance with 
the formulas given in Example 7a in the case that X is uni- 
form over the interval (0, b — a) andc=a. 


8.3. Compute the measurement signal-to-noise ratio— 
that is, |u|/o, where 1 = E[X] and o? = Var(X)—of the 
following random variables: 

(a) Poisson with mean 4; 

(b) binomial with parameters n and p; 

(c) geometric with mean 1/p; 

(d) uniform over (a, b); 

(e) exponential with mean 1//; 

(f) normal with parameters jy, o. 


8.4. Let Z,,n = 1, be a sequence of random variables 
and c a constant such that for each e > 0,P{|Z, — cl > 
e}— 0 as n— oo. Show that for any bounded continuous 
function g, 


Elg(Zn)| > g(c) as) noo 


8.5. Let f(x) be a continuous function defined forO=x <1. 
Consider the functions 


n 
k\ (n\ x -k 
B,(x) = = — x)" 
n(X) »r(F) ( ys (1 — x) 
k=0 
(called Bernstein polynomials) and prove that 


im Bn (x) = f(x) 
Hint: Let X1,X2,... be independent Bernoulli random 
variables with mean x. Show that 


Bal) ls (A+ +") 


and then use Theoretical Exercise 8.4. 


Since it can be shown that the convergence of B,(x) to 
f(x) is uniform in x, the preceding reasoning provides a 
probabilistic proof of the famous Weierstrass theorem of 
analysis, which states that any continuous function on a 
closed interval can be approximated arbitrarily closely by 
a polynomial. 


8.6. (a) Let X be a discrete random variable whose pos- 


sible values are 1,2,.... If P(X = k} is nonincreasing in 
k =1,2,..., prove that 
ELX 
P{IX=k}< | z ! 


(b) Let X be a nonnegative continuous random variable 
having a nonincreasing density function. Show that 


2E[X] 
x2 


forall x > 0 


f@) = 


8.7. Suppose that a fair die is rolled 100 times. Let X; be 
the value obtained on the ith roll. Compute an approxima- 
tion for 

100 


[x <a 
1 


l1<a<6 


8.8. Explain why a gamma random variable with parame- 
ters (t,A) has an approximately normal distribution when tf 
is large. 


8.9. Suppose a fair coin is tossed 1000 times. If the first 100 
tosses all result in heads, what proportion of heads would 
you expect on the final 900 tosses? Comment on the state- 
ment “The strong law of large numbers swamps but does 
not compensate.” 
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8.10. If X is a Poisson random variable with mean A, show 
that fori < A, 


e* (er)! 


il 


PIX <i} < 


8.11. Let X be a binomial random variable with parame- 
ters n and p. Show that, fori > np, 


(a) min e~“E[e'*] occurs when ¢ is such that e = 
an sae , where g = 1 — p. 

(b) PIX = i} = gpl — py 

8.12. The Chernoff bound on a standard normal ran- 
dom variable Z gives P{Z > a} = e“% /2,a > 0. Show, 
by considering the density of Z, that the right side of 
the inequality can be reduced by the factor 2. That is, 
show that 


7/2 as 0 


1 
P{Z>ays 5° 
8.13. Show that if ELX] < 0 and 6 # (is such that E[e**] = 
1,then@ > 0. 


8.14. Let X1,X2,... be a sequence of independent and 
identically distributed random variables with distribution 
F, having a finite mean and variance. Whereas the cen- 
tral limit theorem states that the distribution of )~/_, X; 
approaches a normal distribution as 1 goes to infinity, it 
gives us no information about how large n need be before 
the normal becomes a good approximation. Whereas in 
most applications, the approximation yields good results 
whenever n = 20, and oftentimes for much smaller values 
of n, how large a value of n is needed depends on the dis- 
tribution of X;. Give an example of a distribution F such 
that the distribution of en Xj is not close to a normal 
distribution. 

Hint: Think Poisson. 


8.15. If f and g are density functions that are positive 
over the same region, then the Kullback-Leiber diver- 
gence from density f to density g is defined by 


KL G8) = Eyllog (22> \) = f tog (22) pee ax 


where the notation E/[h(X)] is used to indicate that X has 
density function f. 

(a) Show that KL(f,f) = 0 

(b) Use Jensen’s inequality and the identity log L2 


= log #2), to show that KL¢f, “y= 0 


(x) 
g(x) = 
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8.16. Let L(p) be the Lorenz curve associated with the dis- 
tribution function F, with density function f and mean jp. 


(a) Show that 
1? 4 
Lp) == [! Foydy 
KM JO 
Hint: Starting with L(p) = i ee xf (x)dx, make the change 
of variable y = F(x). 
(b) Use part (a) to show that L(p) is convex. 


Self-Test Problems and Exercises 


8.1. The number of automobiles sold weekly at a certain 
dealership is a random variable with expected value 16. 
Give an upper bound to the probability that 

(a) next week’s sales exceed 18; 

(b) next week’s sales exceed 25. 


8.2. Suppose in Problem 8.14 that the variance of the num- 
ber of automobiles sold weekly is 9. 

(a) Give a lower bound to the probability that next week’s 
sales are between 10 and 22, inclusively. 

(b) Give an upper bound to the probability that next 
week’s sales exceed 18. 


8.3. If 


E[X]=75 E[Y]=75 Var(X) = 10 
Var(Y) =12 Cov(x,Y) = -—3 


give an upper bound to 
(a) P{|X — Y| > 15}; 
(b) P(X > Y + 15}; 
(c) P{Y > X + 15}. 


8.4. Suppose that the number of units produced daily at 
factory A is a random variable with mean 20 and stan- 
dard deviation 3 and the number produced at factory B 
is a random variable with mean 18 and standard deviation 
6. Assuming independence, derive an upper bound for the 
probability that more units are produced today at factory 
B than at factory A. 


8.5. The amount of time that a certain type of component 
functions before failing is a random variable with proba- 
bility density function 


f®)=2x O<x<1 


Once the component fails, it is immediately replaced by 
another one of the same type. If we let X; denote the life- 
n 


time of the ith component to be put in use, then S$, = > Xj 
i=l 
represents the time of the nth failure. The long-term rate 


(c) Show that 


1 1 ioe) 
i Lip) dp = — [ (1 — F(x))xf(x) dx 
0 LK JO 


(d) Verify the preceding formula by using it to compute 
the Gini index of a uniform (0, 1) and an exponential ran- 
dom variable, comparing your answers with those given in 
Example 7d. 


at which failures occur, call it r, is defined by 


Assuming that the random variables X;, i = 1, are inde- 
pendent, determine r. 


8.6. In Self-Test Problem 8.5, how many components 
would one need to have on hand to be approximately 90 
percent certain that the stock would last at least 35 days? 


8.7. The servicing of a machine requires two separate steps, 
with the time needed for the first step being an exponen- 
tial random variable with mean .2 hour and the time for 
the second step being an independent exponential ran- 
dom variable with mean .3 hour. If a repair person has 20 
machines to service, approximate the probability that all 
the work can be completed in 8 hours. 


8.8. On each bet, a gambler loses 1 with probability .7 
loses 2 with probability .2, or wins 10 with probability .1. 
Approximate the probability that the gambler will be los- 
ing after his first 100 bets. 


8.9. Determine f so that the probability that the repair 
person in Self-Test Problem 8.7 finishes the 20 jobs within 
time ¢ is approximately equal to .95. 


8.10. A tobacco company claims that the amount of nico- 
tine in one of its cigarettes is a random variable with mean 
2.2 mg and standard deviation .3 mg. However, the aver- 
age nicotine content of 100 randomly chosen cigarettes 
was 3.1 mg. Approximate the probability that the average 
would have been as high as or higher than 3.1 if the com- 
pany’s claims were true. 


8.11. Each of the batteries in a collection of 40 batteries 
is equally likely to be either a type A or a type B battery. 
Type A batteries last for an amount of time that has mean 
50 and standard deviation 15; type B batteries last for an 


amount of time that has mean 30 and standard deviation 
6. 


(a) Approximate the probability that the total life of all 40 
batteries exceeds 1700. 

(b) Suppose it is known that 20 of the batteries are type A 
and 20 are type B. Now approximate the probability that 
the total life of all 40 batteries exceeds 1700. 


8.12. A clinic is equally likely to have 2, 3, or 4 doctors 
volunteer for service on a given day. No matter how many 
volunteer doctors there are on a given day, the numbers 
of patients seen by these doctors are independent Poisson 
random variables with mean 30. Let X denote the number 
of patients seen in the clinic on a given day. 


(a) Find E[X]. 

(b) Find Var(X). 

(c) Use a table of the standard normal probability distri- 
bution to approximate P{X > 65}. 


8.13. The strong law of large numbers states that with 
probability 1, the successive arithmetic averages of a 
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sequence of independent and identically distributed ran- 
dom variables converge to their common mean jp. What 
do the successive geometric averages converge to? That is, 
what is 


1/n 


n 
lim Xi 
n—> oo I] : 
i 


8.14. Each new book donated to a library must be pro- 
cessed. Suppose that the time it takes to process a book 
has mean 10 minutes and standard deviation 3 minutes. If 
a librarian has 40 books to process, 

(a) approximate the probability that it will take more than 
420 minutes to process all these books; 

(b) approximate the probability that at least 25 books will 
be processed in the first 240 minutes. 


What assumptions have you made? 


8.15. Prove Chebyshev’s sum inequality, which says that 
if a, = a2 = -++ = ay and by = bp = --- = by, then 


nye, abi = OCR ad OCL 5). 
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9.1 The Poisson Process 


Before we define a Poisson process, let us recall that a function f is said to be o(A) if 


That is, f is o(A) if, for small values of h, f(A) is small even in relation to h. Suppose 
now that “events” are occurring at random points at time, and let N(t) denote the 
number of events that occur in the time interval [0,¢]. The collection of random 
variables {N(t),t = 0} is said to be a Poisson process having rate 2, 4. > O, if 


(i) N(O) = 0. 
(ii) The numbers of events that occur in disjoint time intervals are independent. 
(iii) The distribution of the number of events that occur in a given interval depends 
only on the length of that interval and not on its location. 
(iv) P{N(A) = 1} = Ah + o(h). 
(v) P{N(A) = 2} = o(h). 


Thus, condition (i) states that the process begins at time 0. Condition (ii), the 
independent increment assumption, states, for instance, that the number of events 
that occur by time f [that is, N(A)] is independent of the number of events that occur 
between t and t+ s [that is, N(t + s) — N(t)]. Condition (iii), the stationary increment 
assumption, states that the probability distribution of N(t + s) — N(d) is the same 
for all values of f. 

In Chapter 4, we presented an argument, based on the Poisson distribution being 
a limiting version of the binomial distribution, that the foregoing conditions imply 
that N(f) has a Poisson distribution with mean At. We will now obtain this result by 
a different method. 
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Lemma 
I.1 
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For a Poisson process with rate i, 
P{N(t) = 0} =e 
Proof Let Po(t) = P{N(t) = 0}. We derive a differential equation for Po(t) in the 
following manner: 
Po(t + h) = P{N(t + h) =0} 

= PIN) =0,Nt + h) — NO =0} 

= P{N() = O}P{N(G + h) — N(t) = 0} 

= Po(t)[1 — AA + o(h)] 


where the final two equations follow from condition (ii) plus the fact that conditions 
(iv) and (v) imply that P{N(h) = 0} = 1 — Ah + o(h). Hence, 


P h) — P h 
1+ N= PO _ rg 4 


Now, letting h—0, we obtain 


Po(t) = —APo(0) 


or, equivalently, PAW) _ 
Po(t) 
which implies, by integration, that 
log Po(t) = —At + c 
or 
Po(t) = Ke“ 


Since Po9(0) = P{N(O) = 0} = 1, we arrive at 


Po(t) = a 


For a Poisson process, let 7; denote the time the first event occurs. Further, for 
n > 1, let 7, denote the time elapsed between the (n — 1) and the nth event. The 
sequence {7y,n = 1,2,...} is called the sequence of interarrival times. For instance, if 
T, =5 and 7> = 10, then the first event of the Poisson process would have occurred 
at time 5 and the second at time 15. 

We shall now determine the distribution of the T,,. To do so, we first note that 
the event {7, > ft} takes place if and only if no events of the Poisson process occur 
in the interval [0, ¢]; thus, 


P{T, > f= P{N() =0} =e” 
Hence, 7; has an exponential distribution with mean 1/1. Now, 
P{T> > t} = E[P{T>2 > t|T1}] 
However, 


P{T2 > t\|T, =s} = P{0 events in (s,s + A/T, = 5} 
= P{0 events in (s,s + f]} 


— et 


444 Chapter 9 Additional Topics in Probability 


Proposition 
1.1 


Theorem 
L.1 


where the last two equations followed from the assumptions about independent and 
stationary increments. From the preceding, we conclude that 7> is also an exponen- 
tial random variable with mean 1/A and, furthermore, that T> is independent of 7}. 
Repeating the same argument yields Proposition 1.1. 


T,, T,... are independent exponential random variables, each with mean 1/2. 


Another quantity of interest is S,, the arrival time of the mth event, also called 
the waiting time until the nth event. It is easily seen that 


n 
Sx= > T; n=1 
i=1 


hence, from Proposition 1.1 and the results of Section 5.6.1, it follows that S,, has a 
gamma distribution with parameters 1 and i. That is, the probability density of S,, is 
given by 
2 ae Cx 

We are now ready to prove that N(f) is a Poisson random variable with mean At. 

For a Poisson process with rate i, 
—At AD” 
PIN() =n} = as 


Proof Note that the mth event of the Poisson process will occur before or at time f¢ if 
and only if the number of events that occur by fis at least n. That is, 


N®=nesS, <t 
so 


P{N(t) =n} = PIN() =n} — PING =n + 
= P{Sp = t} — P{Sn41 = t} 


t x n—-1 t CO n 
= jens x) dx — i ie Gay" dx 
0 (n — 1)! 0 n! 


But the integration-by-parts formula fudv = uv — f vdu with u = e~* and dv = 
A[(Axy"-1/(n — 1)!] dx yields 


t n-1 n t n 
/ hew** pose dx =e" “a + / rew** al dx 
0 (n — 1)! n! 0 n! 


which completes the proof. 


9.2. Markov Chains 


Consider a sequence of random variables Xo, X1,..., and suppose that the set of 
possible values of these random variables is {0,1,...,M}. It will be helpful to inter- 
pret X,, as being the state of some system at time n, and, in accordance with this 
interpretation, we say that the system is in state i at time n if X;, = i. The sequence 
of random variables is said to form a Markov chain if, each time the system is in state 


Example 
2a 


Example 
2b 
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i, there is some fixed probability—call it P;;—that the system will next be in state j. 
That is, for all ip,...,in-1,4,], 


P{Xn41 = j|Xn = 1, Xp_1 = in-1,...,X1 =, X0 = ig} = Pij 


The values Pjj,0 = i = M,0 =j = N, are called the transition probabilities of the 
Markov chain, and they satisfy 


(Why?) It is convenient to arrange the transition probabilities P;; in a square array 
as follows: 

Poo Poi -:: Pom 
Pig Pi, +++ Pim 


Pyo Pm --: Pum 


Such an array is called a matrix. 

Knowledge of the transition probability matrix and of the distribution of Xo 
enables us, in theory, to compute all probabilities of interest. For instance, the joint 
probability mass function of Xo,..., Xn is given by 


P{Xn = In, Xn—-1 = Poa Ps ss , X41 = i1, Xo i in} 
= P{Xy=in|Xn—-1 = in-1,.--,X0=lo}P{Xn_-1 = in-1,...,X0 = io} 
= Pi,-4,int (Xn-1 = ly=1s sig ,X0 = ig} 


and continual repetition of this argument demonstrates that the preceding is equal to 


Pin -t,inPin_2, ina *** Pit, in Pip, ip: PLXO = to} 


Suppose that whether it rains tomorrow depends on previous weather conditions 
only through whether it is raining today. Suppose further that if it is raining today, 
then it will rain tomorrow with probability a, and if it is not raining today, then it will 
rain tomorrow with probability 6. 

If we say that the system is in state 0 when it rains and state 1 when it does not, 
then the preceding system is a two-state Markov chain having transition probability 
matrix 


That is, P99 =a = 1 — Poi, Pip =B=1 — Py. | 


Consider a gambler who either wins 1 unit with probability p or loses 1 unit with 
probability 1 — p at each play of the game. If we suppose that the gambler will quit 
playing when his fortune hits either 0 or M, then the gambler’s sequence of fortunes 
is a Markov chain having transition probabilities 
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Pit =p=l1 = Pij-1 i=1,...,M —1 
Poo = Pum = 1 


Example The husband-and-wife physicists Paul and Tatyana Ehrenfest considered a concep- 
2c tual model for the movement of molecules in which M molecules are distributed 
among 2 urns. At each time point, one of the molecules is chosen at random and is 
removed from its urn and placed in the other one. If we let X,, denote the number of 
molecules in the first urn immediately after the nth exchange, then {X0, X1,...} is a 

Markov chain with transition probabilities 


Hcot 
Pi = 7 0=is=M 
Put = 7 0=i=M 

Py=0 if j=iorjj— i >1 a 


Thus, for a Markov chain, P;; represents the probability that a system in state i 
will enter state j at the next transition. We can also define the two-stage transition 
probability pe that a system presently in state i will be in state j after two additional 
transitions. That is, 


PO SP Ane an 


The Po can be computed from the Pj as follows: 


PO = P(X) = j|\Xo=3} 
M 
=) Pes), M=kh yaa 
k=0 
M 
= \oP{X) = j|X, =k, Xo = P(X, = kX =H} 
k=0 


M 
= Pers 
k=0 


In general, we define the n-stage transition probabilities, denoted as hee by 
PP = PiXnem = j\Xm = i} 


Proposition 2.1, known as the Chapman—Kolmogorov equations, shows how the oP 
can be computed. 


Proposition The Chapman-Kolmogorov equations 
2.1 


M 
ie = Le forallO <r<n 
k=0 


Example 
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Proof 


PO = P{Xn = Xo = i) 
=) P{Xn = j, Xp = Xo = 3} 
k 


=) P{Xn = j|X, = k, Xo = i P(X, = kl Xo = 3} 
k 


_ (n—r) p(r) 
= Pg Pk 
k 


A random walk 


An example of a Markov chain having a countably infinite state space is the random 
walk, which tracks a particle as it moves along a one-dimensional axis. Suppose that 
at each point in time, the particle will move either one step to the right or one step 
to the left with respective probabilities p and 1 — p. That is, suppose the particle’s 
path follows a Markov chain with transition probabilities 


Pin =p=1— Pii-y i=0,+1,... 


If the particle is at state 7, then the probability that it will be at state j after n tran- 
sitions is the probability that (7 — i + j)/2 of these steps are to the right and 
n — [(n — i + j)/2] = ( + i — j)/2 are to the left. Since each step will be to 
the right, independently of the other steps, with probability p, it follows that the 
preceding is just the binomial probability 


mn n (n-i+j)/2(q yy (n+i-j)/2 
ba Cea dd P) 


n\. é Seo 
where (" is taken to equal 0 when x is not a nonnegative integer less than or 


equal to n. The preceding formula can be rewritten as 
Pio = ( - era — py* k=0,41,...,1n 
2n+1 = 
Pes = ( ers -_ py" k 
k =0,+1,...,tn,—-(n + 1) a 


Although the Pm denote conditional probabilities, we can use them to derive 
expressions for unconditional probabilities by conditioning on the initial state. For 
instance, 


PX, =f} = > PX, = j|Xo = DPM =H} 
= SUPP PX =3 


For a large number of Markov chains, it turns out that rail converges, as Noo, to a 
value 7; that depends only on j. That is, for large values of n, the probability of being 
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in state j after n transitions is approximately equal to 7, no matter what the initial 
state was. It can be shown that a sufficient condition for a Markov chain to possess 
this property is that for some n > 0, 


PP SU forallyg S01 j2.05M (2.1) 


Markov chains that satisfy Equation (2.1) are said to be ergodic. Since Proposi- 
tion 2.1 yields 


M 
(n+l) _ (1) pp. 
y= ys Pig Pj 
k=0 
it follows, by letting noo, that for ergodic chains, 


G= > Ps (2.2) 


M 
Furthermore, since 1 = )> cig we also obtain, by letting n—co, 
j=0 


M 
> m= 1 (2.3) 
j=0 


In fact, it can be shown that the 7,0 = j = M, are the unique nonnegative solutions 
of Equations (2.2) and (2.3). All this is summed up in Theorem 2.1, which we state 
without proof. 
For an ergodic Markov chain, 
:= lim Po 
c= ae a 
n—-> oo 


exists, and the 7;,0 = j = M, are the unique nonnegative solutions of 


M 
= Yo KP ay 
k=0 
M 
a=! 
j=0 


Consider Example 2a, in which we assume that if it rains today, then it will rain 
tomorrow with probability a, and if it does not rain today, then it will rain tomorrow 
with probability 6. From Theorem 2.1, it follows that the limiting probabilities 7 
and zr of rain and of no rain, respectively, are given by 


m= amg + Pry 
m=(1—a)m + 0 —- Bm 
m+ m=1 


which yields 
B l-a 
a IT1 a 
1+,B-a 1+Bp-a 
For instance, if a = .6 and § = .3, then the limiting probability of rain on the nth day 
: 3 
iS 7 = 5. a 


0) 
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The quantity 7; is also equal to the long-run proportion of time that the Markov 
chain is in state j,j = 0,..., M. To see intuitively why this might be so, let P; denote 
the long-run proportion of time the chain is in state j. (It can be proven using the 
strong law of large numbers that for an ergodic chain, such long-run proportions 
exist and are constants.) Now, since the proportion of time the chain is in state k is 
Px, and since, when in state k, the chain goes to state j with probability P;;, it follows 
that the proportion of time the Markov chain is entering state j from state k is equal 
to P,Pxj. Summing over all k shows that P;, the proportion of time the Markov chain 
is entering state j, satisfies 


P;= > PP ag 
k 


Since clearly it is also true that 


¥ Pei 
j 


it thus follows, since by Theorem 2.1 the 7j,j = 0,...,M are the unique solution of 
the preceding, that P; = 7j,j = 0,...,M. The long-run proportion interpretation of 
7 is generally valid even when the chain is not ergodic. 


Example Suppose in Example 2c that we are interested in the proportion of time that there 
2t are j molecules in urn 1,7 = 0,...,M. By Theorem 2.1, these quantities will be the 
unique solution of 


m= X uM 


M=j7 41 
M 


f+. 
my = m4 X + mj xX ja t.M 


uM =WM-1 X* = 
M 


However, as it is easily checked that 


My 71\" 
a VA /=0,....M 
4 (*"\G) d 


satisfy the preceding equations, it follows that these are the long-run proportions of 
time that the Markov chain is in each of the states. (See Problem 9.11 for an expla- 
nation of how one might have guessed at the foregoing solution.) a 


9.3 Surprise, Uncertainty, and Entropy 


Consider an event E that can occur when an experiment is performed. How sur- 
prised would we be to hear that FE does, in fact, occur? It seems reasonable to sup- 
pose that the amount of surprise engendered by the information that EF has occurred 
should depend on the probability of E. For instance, if the experiment consists of 
rolling a pair of dice, then we would not be too surprised to hear that E has occurred 
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when E represents the event that the sum of the dice is even (and thus has proba- 
bility +), whereas we would certainly be more surprised to hear that E has occurred 
when F is the event that the sum of the dice is 12 (and thus has probability 3g): 

In this section, we attempt to quantify the concept of surprise. To begin, let 
us agree to suppose that the surprise one feels upon learning that an event E has 
occurred depends only on the probability of F, and let us denote by S(p) the sur- 
prise evoked by the occurrence of an event having probability p. We determine the 
functional form of S(p) by first agreeing on a set of reasonable conditions that S(p) 
should satisfy and then proving that these axioms require that S(p) have a specified 
form. We assume throughout that S(p) is defined for all0 < p < 1 but is not defined 
for events having p = 0. 

Our first condition is just a statement of the intuitive fact that there is no surprise 
in hearing that an event that is sure to occur has indeed occurred. 


Axiom 1 
Sd) =0 


Our second condition states that the more unlikely an event is to occur, the 
greater is the surprise evoked by its occurrence. 


Axiom 2 
S(p) is a strictly decreasing function of p; that is, if p < gq, then S(p) > S(q). 


The third condition is a mathematical statement of the fact that we would intui- 
tively expect a small change in p to correspond to a small change in S(p). 


Axiom 3 
S(p) is a continuous function of p. 


To motivate the final condition, consider two independent events E and F hav- 
ing respective probabilities P(E) = p and P(F) = q. Since P(EF) = pq, the surprise 
evoked by the information that both E and F have occurred is S(pq). Now, sup- 
pose that we are told first that E has occurred and then, afterward, that F' has also 
occurred. Since S(p) is the surprise evoked by the occurrence of E, it follows that 
S(pq) — S(p) represents the additional surprise evoked when we are informed that 
F has also occurred. However, because F is independent of FE, the knowledge that E 
occurred does not change the probability of F; hence, the additional surprise should 
just be S(q). This reasoning suggests the final condition. 


Axiom 4 
S(pq) = S(—p) + S(q) O0O<px=1, 0<qe=l 


We are now ready for Theorem 3.1, which yields the structure of S(p). 
If S(-) satisfies Axioms 1 through 4, then 


S(p) = —Clog, p 
where C is an arbitrary positive integer. 
Proof It follows from Axiom 4 that 
S(p’) = S(p) + S(p) = 2S(p) 


and by induction that 
S(p™) = mS(p) (3.1) 
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Also, since, for any integral n, S(p) = S(p!/"--- -p'/") =n S(p'/"), it follows that 


1 
S(pil") = —S(p) (3.2) 
n 
Thus, from Equations (3.1) and (3.2), we obtain 
Sp") = mS(p"!") 
m 
= —S(p) 
n 


which is equivalent to 
S(p*) = xS(p) (3.3) 


whenever x is a positive rational number. But by the continuity of S (Axiom 3), it 
follows that Equation (3.3) is valid for all nonnegative x. (Reason this out.) 


x 
Now, for any p,0 < p = 1, let x = —log,p. Then p = (5) , and from Equa- 


tion (3.3), 
1\* 1 
S(p) = S$ ((5) =xS (;) = —Clog,p 


where C= S$ (3) > S(1) = 0 by Axioms 2 and 1. 


It is usual to let C equal 1, in which case the surprise is said to be expressed in 
units of bits (short for binary digits). 

Next, consider a random variable X that must take on one of the values x1,...,xy 
with respective probabilities p1,..., pn. Since — log p; represents the surprise evoked 
if X takes on the value x;,' it follows that the expected amount of surprise we shall 
receive upon learning the value of X is given by 


n 
H(X) =—)° pilogp; 
i=1 


The quantity H(X) is known in information theory as the entropy of the random 
variable X. (In case one of the p; = 0, we take 0 log 0 to equal 0.) It can be shown 
(and we leave it as an exercise) that H(X) is maximized when all of the p; are equal. 
(Is this intuitive?) 

Since H(X) represents the average amount of surprise one receives upon 
learning the value of X, it can also be interpreted as representing the amount of 
uncertainty that exists as to the value of X. In fact, in information theory, H(X) is 
interpreted as the average amount of information received when the value of X is 
observed. Thus, the average surprise evoked by_X, the uncertainty of X, or the aver- 
age amount of information yielded by X all represent the same concept viewed from 
three slightly different points of view. 

Now consider two random variables X and Y that take on the respective values 
X1,...,X, and yj,...,¥m With joint mass function 


P(X, yj) = P(X = xi, Y = yj} 


+ For the remainder of this chapter, we write log x for log, x. Also, we use In x for log, x. 
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It follows that the uncertainty as to the value of the random vector (X, Y), denoted 
by H(X, Y), is given by 


H(X,Y) =— >>) pti yj) log p(x. y;) 
ij 
Suppose now that Y is observed to equal yj. In this situation, the amount of uncer- 
tainty remaining in X is given by 
Hy=y,(X) = — > p(aily;) log p@aily,) 
i 
where 


P(xilyj) = PLX = xi|Y = yj} 


Hence, the average amount of uncertainty that will remain in X after Y is observed 
is given by 
Hy(X) = > Ay=y,(X)py(y) 
j 
where 
Py (yj) = PLY = y;} 


Proposition 3.1 relates H(X, Y) to H(Y) and Hy(X). It states that the uncertainty as 
to the value of X and Y is equal to the uncertainty of Y plus the average uncertainty 
remaining in X when Y is to be observed. 


A(X, Y) = H(Y) + AHy(X) 


Proof Using the identity p(x;, yj) = py (yj) p(xily;) yields 


H(X,Y) =- JS) >i pe, y/) log pai. ys) 
ij 


== > > py (pp aily [log py 07) + log p(ily;)] 


tj 


=-) i py) logpy 0s) >> paily) 
J i 


— Yopvoy) Yo pGily)) log pily,) 
J i 


= H(Y) + Hy(X) 


It is a fundamental result in information theory that the amount of uncertainty in 
a random variable X will, on the average, decrease when a second random variable 
Y is observed. Before proving this statement, we need the following lemma, whose 
proof is left as an exercise. 


Inx=x-1 x>0 


with equality only at x = 1. 


Hy(X) < H(X) 


with equality if and only if X and Y are independent. 
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Proof 


Hy(X) — W(X) =—-) Yo pailyp log[pcaly) pop 
ij 


+ 2S) poi. yj) log pox) 
ij 


= E Lotsiyp oe ee 
ij 


P(xilyj) 


< loge )) S° pi. yi) Eee - | by Lemma 3.1 
i j P(xilyj) 


i i 


=loge} S) > papa) — D> >opai.y 
i j 


= loge[1 — 1] 
=0 


9.4 Coding Theory and Entropy 


Suppose that the value of a discrete random vector X is to be observed at location 
A and then transmitted to location B via a communication network that consists of 
two signals, 0 and 1. In order to do this, it is first necessary to encode each possible 
value of X in terms of a sequence of 0’s and 1’s. To avoid any ambiguity, it is usu- 
ally required that no encoded sequence can be obtained from a shorter encoded 
sequence by adding more terms to the shorter. 

For instance, if X can take on four possible values x1, x2,x3, and x4, then one 
possible coding would be 


XS 00 

x2? Ol 

(4.1) 
x3 < 10 
x4 11 

That is, if X = x,, then the message 00 is sent to location B, whereas if X = x2, then 

01 is sent to B, and so on. A second possible coding is 


xy <0 

x2 <= 10 
x3 < 110 
x4 111 


(4.2) 


However, a coding such as 


xy << 0 
x2 el 
x3 < 00 
x4 << O1 


454 Chapter 9 Additional Topics in Probability 


Lemma 
4.1 


is not allowed because the coded sequences for x3 and x4 are both extensions of the 
one for x}. 

One of the objectives in devising a code is to minimize the expected number of 
bits (that is, binary digits) that need to be sent from location A to location B. For 
example, if 


1 
PIX =x} = 5 
1 
PIX = x3} = 3 
1 
PIX = x3) = 5 
1 
PX = xa} = 5 


then the code given by Equation (4.2) would expect to send 5(1) + 4 (2) + (3) + 
53) = 1.75 bits, whereas the code given by Equation (4.1) would expect to send 2 
bits. Hence, for the preceding set of probabilities, the encoding in Equation (4.2) is 
more efficient than that in Equation (4.1). 

The preceding discussion raises the following question: For a given random vec- 
tor X, what is the maximum efficiency achievable by an encoding scheme? The answer 
is that for any coding, the average number of bits that will be sent is at least as large as 
the entropy of X. To prove this result, known in information theory as the noiseless 
coding theorem, we shall need Lemma 4.1. 


Let X take on the possible values x,,...,xj. Then, in order to be able to encode 
the values of X in binary sequences (none of which is an extension of another) of 
respective lengths n1,...,y, it is necessary and sufficient that 


i=1 


Proof For a fixed set of N positive integers n1,...,y, let w; denote the number of 
the n; that are equal to j,j = 1,.... For there to be a coding that assigns n; bits to the 
value x;,i = 1,...,N, it is clearly necessary that w; =< 2. Furthermore, because no 
binary sequence is allowed to be an extension of any other, we must have w2 = 27 — 
2w. (This follows because 2? is the number of binary sequences of length 2, whereas 
2w, is the number of sequences that are extensions of the w; binary sequence of 
length 1.) In general, the same reasoning shows that we must have 


Wet awl = Ww =. = Wi? (4.3) 


for n = 1,.... In fact, a little thought should convince the reader that these condi- 
tions are not only necessary, but also sufficient for a code to exist that assigns n; bits 
tox,,i=1,...,N. 


Rewriting inequality (4.3) as 
Wa Wad + Wyod? +o tw" S22 a= 1... 
and dividing by 2” yields the necessary and sufficient conditions, namely, 


n 1 J 
Sow; 5) <1 foralln (4.4) 


jx 


Theorem 
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n j 
However, because )° wj (3) is increasing in n, it follows that Equation (4.4) will be 


j=l 
yw; (;) <1 
2. 


j=l 


true if and only if 


The result is now established, since, by the definition of w; as the number of n; that 
equal j, it follows that 
N 


(i) £0) 


j=l 1 


We are now ready to prove Theorem 4.1. 
The noiseless coding theorem 


Let X take on the values x1,...,x,j with respective probabilities p(x1),...,p(xn). 
Then, for any coding of X that assigns n; bits to xj, 


N N 
Yo nip) = W(X) = — > pi) logp@) 


i=1 i=1 


N 
Proof Let P; = p(x;),qi = 7a | 522-7, i=1,...,N. Then 
j=l 


N N 
P; P; 
- y P;log (=) = — loge y Pin (=) 
i=1 di i=1 ! 
N di 
= loge y Pin (4) 


i=1 


IA 


loge )) Pi (4 - i) by Lemma 3.1 


N 


N 
=0 since YRS Yas 1 
i=1 


isi 


Hence, 
N N 
- bays: log Pj = — S> Pilog qi 

i=1 i=1 

N N 
=) njP; + log | )>2-% 

i=1 j=l 
N 


< > niP; by Lemma 4.1 
i=1 
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Consider a random variable X with probability mass function 


_ 1 _ 1 2 _ 1 
P(x) = 5 P(x2) = Z p(x3) = p(x4) = 3 


Since 


1 1 1 1 1 1 
H(X)= l l l 
(X) | 5lons + Glows + zloe 5 | 
1 2 3 


“9°44 
= 1.75 


it follows from Theorem 4.1 that there is no more efficient coding scheme than 
x1 <> 0 
x2 <-> 10 
x3 < 110 


x4 111 = 


For most random vectors, there does not exist a coding for which the average 
number of bits sent attains the lower bound H(X). However, it is always possible to 
devise a code such that the average number of bits is within 1 of H(X). To prove this, 
define n; to be the integer satisfying 


—log p(x) = nj < —logp(x) + 1 
Now, 


N Nv = 
oe" = eer = pe) =1 
i=1 i=l i 


so, by Lemma 4.1, we can associate sequences of bits having lengths n; with the 


xj;,i=1,...,N. The average length of such a sequence, 
N 
L= De nj P(Xi) 
i=1 
satisfies 
N N 
—)\ p(x) log p(xi) = L < — Y¢ p(x) logp(mi) + 1 
i=1 i=1 
or 


H(X) <= L < H(X) +1 


Suppose that 10 independent tosses of a coin having probability p of coming up 
heads are made at location A and the result is to be transmitted to location B. The 
outcome of this experiment is a random vector X = (X1,..., X19), where X; is 1 or 
0 according to whether or not the outcome of the ith toss is heads. By the results of 
this section, it follows that L, the average number of bits transmitted by any code, 
satisfies 

AX)sL 
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with 
L< H(X)4+1 


for at least one code. Now, since the X; are independent, it follows from Proposi- 
tion 3.1 and Theorem 3.2 that 


10 
H(X) = H(X%,...,X10) =) A(X) 
i=1 


=-10[plogp + (1 — p)log(d — p)] 


Ifp= 5, then H(X) = 10, and it follows that we can do no better than just encoding 
X by its actual value. For example, if the first 5 tosses come up heads and the last 5 
tails, then the message 1111100000 is transmitted to location B. 

However, if p # om we can often do better by using a different coding scheme. 


For instance, if p = i then 


1 1 3 3 
A(X) = —10 l l = 8.11 
(Xx) Gj 03 + je 7) 


Thus, there is an encoding for which the average length of the encoded message is 
no greater than 9.11. 

One simple coding that is more efficient in this case than the identity code is 
to break up (X1,..., X19) into 5 pairs of 2 random variables each and then, for i = 
1, 3, 5, 7% 9, code each of the pairs as follows: 

Xi = 0, Xi4, =0 - 0 
Xj, = 0, X41, =1 <— 10 
X, = 1, X41 =0 — 110 
X= 1,Xi4, =1 <0 111 


The total message transmitted is the successive encodings of the preceding pairs. 
For instance, if the outcome TTTHHTTTTH is observed, then the message 
010110010 is sent. The average number of bits needed to transmit the message with 


this code is 
3 1\ 73 ty. f3 1\7] 135 
s|1(3) +2(3) (3) +33) (3) +9G) |[- 4 
~ 8.44 a 


Up to this point, we have assumed that the message sent at location A is received 
without error at location B. However, there are always certain errors that can occur 
because of random disturbances along the communications channel. Such random 
disturbances might lead, for example, to the message 00101101, sent at A, being 
received at B in the form 01101101. 

Let us suppose that a bit transmitted at location A will be correctly received at 
location B with probability p, independently from bit to bit. Such a communications 
system is called a binary symmetric channel. Suppose further that p = .8 and we 
want to transmit a message consisting of a large number of bits from A to B. Thus, 
direct transmission of the message will result in an error probability of .20 for each 
bit, which is quite high. One way to reduce this probability of bit error would be to 
transmit each bit 3 times and then decode by majority rule. That is, we could use the 
following scheme: 
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Encode Decode Encode Decode 
000 111 
001 110 
0—000 010 —0 1-111 101 —>1 
100 011 


Note that if no more than one error occurs in transmission, then the bit will be 
correctly decoded. Hence, the probability of bit error is reduced to 


(2)? + 3(.2)%(.8) = .104 


a considerable improvement. In fact, it is clear that we can make the probability of 
bit error as small as we want by repeating the bit many times and then decoding by 
majority rule. For instance, the scheme 


Encode Decode 


0—string of 170’s By majority rule 
1— string of 17 1’s 


will reduce the probability of bit error to below .01. 

The problem with this type of encoding scheme is that although it decreases the 
probability of bit error, it does so at the cost of also decreasing the effective rate of 
bits sent per signal. (See Table 9.1.) 

In fact, at this point it may appear inevitable to the reader that decreasing the 
probability of bit error to 0 always results in also decreasing the effective rate at 
which bits are transmitted per signal to 0. However, a remarkable result of informa- 
tion theory known as the noisy coding theorem and due to Claude Shannon demon- 
strates that this is not the case. We now state this result as Theorem 4.2. 


The noisy coding theorem 


There is a number C such that for any value R that is less than C, and for any e > 0, 
there exists a coding—decoding scheme that transmits at an average rate of R bits 
sent per signal and with an error (per bit) probability of less than e. The largest such 
value of C—call it C*? —is called the channel capacity, and for the binary symmetric 
channel, 

C*=1 + plogp + (1 — p)log(1 — p) 


Table 9.1 Repetition of Bits Encoding Scheme. 
Probability of error Rate 
(per bit) (bits transmitted per signal) 
.20 1 
10 33 (= 3) 
01 06 (= 77) 


Tor an entropy interpretation of C*, see Theoretical Exercise 9.18. 


Summary 


The Poisson process having rate 4 is a collection of ran- 
dom variables {N(t),¢ = 0} that relate to an underlying 
process of randomly occurring events. For instance, N(f) 
represents the number of events that occur between times 
0 and t. The defining features of the Poisson process are as 
follows: 


(i) The number of events that occur in disjoint time 
intervals are independent. 

The distribution of the number of events that occur 
in an interval depends only on the length of the inter- 
val. 


(ii) 


(iii) Events occur one at a time. 
(iv) 
It can be shown that N(f) is a Poisson random variable with 
mean At. In addition, if 7;,i = 1, are the times between the 
successive events, then they are independent exponential 
random variables with rate i. 

A sequence of random variables X;,,n = 0, each of 
which takes on one of the values 0,...,M, is said to be 
a Markov chain with transition probabilities P; ; if, for all 


n, 10, ee slnyt, J, 


Events occur at rate i. 


P{Xny1 = [Xn = 1, Xn-1 = tn-1,---,X0 = lo} = Pi,j 

If we interpret X,, as the state of some process at time 
n, then a Markov chain is a sequence of successive states 
of a process that has the property that whenever it enters 
state i, then, independently of all past states, the next state 


Problems and Theoretical Exercises 


9.1. Customers arrive at a bank at a Poisson rate A. Sup- 
pose that two customers arrived during the first hour. 
What is the probability that 


(a) both arrived during the first 20 minutes? 
(b) at least one arrived during the first 20 minutes? 


9.2. Cars cross a certain point in the highway in accor- 
dance with a Poisson process with rate 2 = 3 per minute. If 
Al runs blindly across the highway, what is the probability 
that he will be uninjured if the amount of time that it takes 
him to cross the road is s seconds? (Assume that if he is on 
the highway when a car passes by, then he will be injured.) 
Do this exercise for s = 2,5, 10,20. 


9.3. Suppose that in Problem 9.2, Al is agile enough to 
escape from a single car, but if he encounters two or more 
cars while attempting to cross the road, then he is injured. 
What is the probability that he will be unhurt if it takes him 
s seconds to cross? Do this exercise for s = 5,10, 20, 30. 
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is j with probability P;;, for all states i and j. For many 
Markov chains, the probability of being in state j at time 1 
converges to a limiting value that does not depend on the 
initial state. If we let 7,7 = 0,...,M, denote these limit- 
ing probabilities, then they are the unique solution of the 
equations 


M 
nS y mr j=0,...,M 
i=0 
M 
y= 
j=l 


Moreover, 7; is equal to the long-run proportion of time 
that the chain is in state j. 

Let X be a random variable that takes on one of 
n possible values according to the set of probabilities 
{P1,---,»Pn}. The quantity 


H(X) = —)~ pjlogy(pi) 


i=] 


is called the entropy of X. It can be interpreted as rep- 
resenting either the average amount of uncertainty that 
exists regarding the value of X or the average informa- 
tion received when_X is observed. Entropy has important 
implications for binary codings of X. 


9.4. Suppose that 3 white and 3 black balls are distributed 
in two urns in such a way that each urn contains 3 balls. 
We say that the system is in state iif the first urn contains / 
white balls, i = 0,1,2,3. At each stage, 1 ball is drawn from 
each urn and the ball drawn from the first urn is placed 
in the second, and conversely with the ball from the sec- 
ond urn. Let X,, denote the state of the system after the 
nth stage, and compute the transition probabilities of the 
Markov chain {X,,n = O}. 


9.5. Consider Example 2a. If there is a 50-50 chance of 
rain today, compute the probability that it will rain 3 days 
from now if a = .7 and B = .3. 


9.6. Compute the limiting probabilities for the model of 
Problem 9.4. 


9.7. A transition probability matrix is said to be doubly 
stochastic if 
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M 
> Pi =1 
i=0 


for all states 7 = 0,1,...,M. Show that such a Markov 
chain is ergodic, then I] =1/(M + 1),j7=0,1,...,M. 


9.8. On any given day, Buffy is either cheerful (c), so-so 
(s), or gloomy (g). If she is cheerful today, then she will be 
c, S, or g tomorrow with respective probabilities .7, .2, and 
.L. If she is so-so today, then she will be c, s, or g tomorrow 
with respective probabilities .4, .3, and .3. If she is gloomy 
today, then Buffy will be c, s, or g tomorrow with prob- 
abilities .2, .4, and .4. What proportion of time is Buffy 
cheerful? 


9.9. Suppose that whether it rains tomorrow depends on 
past weather conditions only through the past 2 days. 
Specifically, suppose that if it has rained yesterday and 
today, then it will rain tomorrow with probability .8; if it 
rained yesterday but not today, then it will rain tomorrow 
with probability .3; if it rained today but not yesterday, 
then it will rain tomorrow with probability .4; and if it 
has not rained either yesterday or today, then it will rain 
tomorrow with probability .2. What proportion of days 
does it rain? 


9.10. A certain person goes for a run each morning. When 
he leaves his house for his run, he is equally likely to go 
out either the front or the back door, and similarly, when 
he returns, he is equally likely to go to either the front or 
the back door. The runner owns 5 pairs of running shoes, 
which he takes off after the run at whichever door he hap- 
pens to be. If there are no shoes at the door from which 
he leaves to go running, he runs barefooted. We are inter- 
ested in determining the proportion of time that he runs 
barefooted. 


(a) Set this problem up as a Markov chain. Give the states 
and the transition probabilities. 


(b) Determine the proportion of days that he runs bare- 
footed. 


9.11. This problem refers to Example 2f. 


(a) Verify that the proposed value of I] satisfies the nec- 
essary equations. 


(b) For any given molecule, what do you think is the (lim- 
iting) probability that it is in urn 1? 


Self-Test Problems and Exercises 
9.1. Events occur according to a Poisson process with rate 
A = 3 per hour. 


(a) What is the probability that no events occur between 
times 8 and 10 in the morning? 


(c) Do you think that the events that molecule j, 7 = 1, is 
in urn 1 at a very large time would be (in the limit) inde- 
pendent? 


(d) Explain why the limiting probabilities are as given. 


9.12. Determine the entropy of the sum that is obtained 
when a pair of fair dice is rolled. 


9.13. Prove that if X can take on any of n possible values 
with respective probabilities P;,..., Py, then H(X) is max- 
imized when P; = 1/n,i = 1,...,n. What is H(X) equal to 
in this case? 


9.14. A pair of fair dice is rolled. Let 


_ Ji ifthe sum of the dice is 6 
~ 10 otherwise 


and let Y equal the value of the first die. Compute (a) 
H(Y), (b) Hy(X), and (c) H(X, Y). 


9.15. A coin having probability p = e of coming up heads 
is flipped 6 times. Compute the entropy of the outcome of 
this experiment. 


9.16. A random variable can take on any of n possible 
values x1,...,X, with respective probabilities p(xj),i = 
1,...,n. We shall attempt to determine the value of X by 
asking a series of questions, each of which can be answered 
“yes” or “no.” For instance, we may ask “Is X = x1?” or 
“Is X equal to either x; or x2 or x3?” and so on. What can 
you say about the average number of such questions that 
you will need to ask to determine the value of X? 


9.17. Show that for any discrete random variable X and 
function f, 
A(f(X)) = H(X) 


9.18. In transmitting a bit from location A to location B, 
if we let X denote the value of the bit sent at location 
A and Y denote the value received at location B, then 
H(X) — Hy(X) is called the rate of transmission of infor- 
mation from A to B. The maximal rate of transmission, as 
a function of PLY = 1} = 1 P{X = 0}, is called the 
channel capacity. Show that for a binary symmetric chan- 
nel with P{Y 1X 1} PLY OX O} = p, 
the channel capacity is attained by the rate of transmis- 
sion of information when P{XY = 1} = 5 and its value is 
l-> plogp + (1 = p)log(l = p). 


(b) What is the expected value of the number of events 
that occur between times 8 and 10 in the morning? 

(c) What is the expected time of occurrence of the fifth 
event after 2 PM.? 


9.2. Customers arrive at a certain retail establishment 
according to a Poisson process with rate A per hour. Sup- 
pose that two customers arrive during the first hour. Find 
the probability that 


(a) both arrived in the first 20 minutes; 
(b) at least one arrived in the first 30 minutes. 


9.3. Four out of every five trucks on the road are followed 
by a car, while one out of every six cars is followed by a 
truck. What proportion of vehicles on the road are trucks? 


9.4. A certain town’s weather is classified each day as 
being rainy, sunny, or overcast, but dry. If it is rainy one 
day, then it is equally likely to be either sunny or overcast 
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the following day. If it is not rainy, then there is one chance 
in three that the weather will persist in whatever state it is 
in for another day, and if it does change, then it is equally 
likely to become either of the other two states. In the long 
run, what proportion of days are sunny? What proportion 
are rainy? 


9.5. Let X be a random variable that takes on 5 possible 
values with respective probabilities .35, .2, .2, .2, and .05. 
Also, let Y be a random variable that takes on 5 possible 
values with respective probabilities .05, .35, .1,.15, and .35. 


(a) Show that H(X) > H(Y). 
(b) Using the result of Problem 9.13, give an intuitive 
explanation for the preceding inequality. 
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10.1 Introduction 


How can we determine the probability of our winning a game of solitaire? 
(By solitaire, we mean any one of the standard solitaire games played with an ordi- 
nary deck of 52 playing cards and with some fixed playing strategy.) One possible 
approach is to start with the reasonable hypothesis that all (52)! possible arrange- 
ments of the deck of cards are equally likely to occur and then attempt to determine 
how many of these lead to a win. Unfortunately, there does not appear to be any sys- 
tematic method for determining the number of arrangements that lead to a win, and 
as (52)! is a rather large number and the only way to determine whether a particular 
arrangement leads to a win seems to be by playing the game out, it can be seen that 
this approach will not work. 

In fact, it might appear that the determination of the probability of winning 
at solitaire is mathematically intractable. However, all is not lost, for probability 
falls not only within the realm of mathematics, but also within the realm of applied 
science; and, as in all applied sciences, experimentation is a valuable technique. For 
our solitaire example, experimentation takes the form of playing a large number of 
such games or, better yet, programming a computer to do so. After playing, say, n 
games, if we let 
1 if the ith game results in a win 
Xj = : 

0 otherwise 


then X;,i = 1,..., will be independent Bernoulli random variables for which 
E|Xi] = P{win at solitaire} 


Hence, by the strong law of large numbers, we know that 


n 
y Xj number of games won 
n 


a ~ number of games played 
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will, with probability 1, converge to P{win at solitaire}. That is, by playing a large 
number of games, we can use the proportion of games won as an estimate of the 
probability of winning. This method of empirically determining probabilities by 
means of experimentation is known as simulation. 

In order to use a computer to initiate a simulation study, we must be able to 
generate the value of a uniform (0, 1) random variable; such variates are called ran- 
dom numbers. To generate them, most computers have a built-in subroutine, called a 
random-number generator, whose output is a sequence of pseudorandom numbers — 
a sequence of numbers that is, for all practical purposes, indistinguishable from a 
sample from the uniform (0, 1) distribution. Most random-number generators start 
with an initial value Xo, called the seed, and then recursively compute values by 
specifying positive integers a, c, and m, and then letting 


Xn41 = (aXn + c)modulom n=O (1.1) 


where the foregoing means that aX, + cis divided by m and the remainder is taken 
as the value of X7,,1. Thus, each X;, is either 0,1,...,1 — 1, and the quantity X;,/m is 
taken as an approximation to a uniform (0, 1) random variable. It can be shown that 
subject to suitable choices for a, c, and m, Equation (1.1) gives rise to a sequence 
of numbers that look as if they were generated from independent uniform (0, 1) 
random variables. 

As our starting point in simulation, we shall suppose that we can simulate from 
the uniform (0, 1) distribution, and we shall use the term random numbers to mean 
independent random variables from this distribution. 

In the solitaire example, we would need to program a computer to play out the 
game starting with a given ordering of the cards. However, since the initial ordering 
is supposed to be equally likely to be any of the (52)! possible permutations, it is also 
necessary to be able to generate a random permutation. Using only random num- 
bers, the following algorithm shows how this can be accomplished. The algorithm 
begins by randomly choosing one of the elements and then putting it in position n; it 
then randomly chooses among the remaining elements and puts the choice in posi- 
tion — 1, and so on. The algorithm efficiently makes a random choice among the 
remaining elements by keeping these elements in an ordered list and then randomly 
choosing a position on that list. 


Generating a random permutation 


Suppose we are interested in generating a permutation of the integers 1,2,..., such 
that all m! possible orderings are equally likely. Then, starting with any initial permu- 
tation, we will accomplish this after m — 1 steps, where we interchange the positions 
of two of the numbers of the permutation at each step. Throughout, we will keep 
track of the permutation by letting X(i),i = 1,...,2 denote the number currently in 
position 7. The algorithm operates as follows: 


1. Consider any arbitrary permutation, and let X(i) denote the element in posi- 
tion i, i= 1...,n. [For instance, we could take X(f) =i,i=1,...,n.] 

2. Generate a random variable N,, that is equally likely to equal any of the values 
1,2,...,n. 

3. Interchange the values of X(N,,) and X(n). The value of X(n) will now remain 
fixed. [For instance, suppose that n = 4 and initially X¥(@) = i,i = 1,2,3,4. If 
N4 = 3, then the new permutation is X(1) = 1, X(2) = 2, X(3) = 4,X(4 =3, 
and element 3 will remain in position 4 throughout.] 
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4. Generate a random variable N,,_; that is equally likely to be either 1,2,..., 
n—l. 

5. Interchange the values of X(N,_1) and X(n — 1). [If N3 = 1, then the new 
permutation is X(1) = 4, X(2) =2, X(3) =1,X(4) =3.] 

6. Generate N,,_2, which is equally likely to be either 1,2,...,n — 2. 

7 Interchange the values of X(N,—2) and X(n — 2). [If No = 1, then the new 
permutation is X(1) = 2,X(2) = 4,X(3) = 1,X(4) = 3, and this is the final 
permutation.] 

8. Generate Nn_3, and so on. The algorithm continues until N2 is generated, and 
after the next interchange the resulting permutation is the final one. 


To implement this algorithm, it is necessary to be able to generate a random 
variable that is equally likely to be any of the values 1,2,...,4. To accomplish this, 
let U denote a random number—that is, U is uniformly distributed on (0, 1)—and 
note that kU is uniform on (0, k). Hence, 


1 
Pfi-1< kU < a=. el ere 


so if we take Nyx = [KU] + 1, where [x] is the integer part of x (that is, the largest 
integer less than or equal to x), then Nx will have the desired distribution. 
The algorithm can now be succinctly written as follows: 


Step 1. Let X(1),...,X(v) be any permutation of 1,2,...,n. [For instance, we 
can set X(i) =i,i=1,...,n.] 

Step 2. Let / =n. 

Step 3. Generate a random number U and set N = [JU] + 1. 

Step 4. Interchange the values of X(N) and X(J). 

Step 5. Reduce the value of J by 1, andif J > 1, go to step 3. 

Step 6. X(1),...,X(n) is the desired random generated permutation. 


The foregoing algorithm for generating a random permutation is extremely use- 
ful. For instance, suppose that a statistician is developing an experiment to compare 
the effects of m different treatments on a set of n subjects. He decides to split the 
subjects into m different groups of respective sizes 11, N2,...,m, where pa nj =n, 
with the members of the ith group to receive treatment i. To eliminate any bias in 
the assignment of subjects to treatments (for instance, it would cloud the meaning of 
the experimental results if it turned out that all the “best” subjects had been put in 
the same group), it is imperative that the assignment of a subject to a given group be 
done “at random.” How is this to be accomplished?* 

A simple and efficient procedure is to arbitrarily number the subjects 1 through 
n and then generate a random permutation X(1),...,X(n) of 1,2,...,n. Now assign 
subjects X(1), X(2),..., X(m1) to be in group 1; X(m, + 1),...,X( (my + nz) to be 
in group 2; and, in general, group j is to consist of subjects numbered X(m, + nz + 
we ee Peg PK) KS Leap |_| 


¥ Another technique for randomly dividing the subjects when m = 2 was presented in Example 2g of Chapter 6. 
The preceding procedure is faster, but requires more space than the one of Example 2g. 
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10.2 General Techniques for Simulating Continuous 
Random Variables 


Proposition 
2.1 


Example 
2a 


In this section, we present two general methods for using random numbers to simu- 
late continuous random variables. 


10.2.1 The Inverse Transformation Method 


A general method for simulating a random variable having a continuous 
distribution—called the inverse transformation method—is based on the following 
proposition. 
Let U be a uniform (0, 1) random variable. For any continuous distribution function 
F, if we define the random variable Y by 

Y = F-'(U) 


then the random variable Y has distribution function F. [F~!(x) is defined to equal 
that value y for which F(y) = x.] 


Proof 
Fy(a) = P{Y = a4} 
= P{F-!(U) = a} (2.1) 
Now, since F(x) is a monotone function, it follows that F~!(U) < a if and only if 


U = F(a). Hence, from Equation (2.1), we have 


Fy(a) = P{U = F(a} 
= F(a) 


It follows from Proposition 2.1 that we can simulate a random variable X having 
a continuous distribution function F by generating a random number U and then 
setting X = FU). 


Simulating an exponential random variable 
If F(x) =1 — e™*, then F~!(u) is that value of x such that 
1-—e*=u 


or 
x = —logd — u) 


Hence, if U is a uniform (0, 1) variable, then 
F-!(U) = —log(1 — U) 


is exponentially distributed with mean 1. Since 1 — U is also uniformly distributed on 
(0, 1), it follows that — log U is exponential with mean 1. Since cX is exponential with 
mean c when X is exponential with mean 1, it follows that —c log U is exponential 
with mean c. a 


The results of Example 2a can also be utilized to stimulate a gamma random 
variable. 
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2b 


Proposition 
2.2 


Simulating a gamma (n, 1) random variable 


To simulate from a gamma distribution with parameters (n, 2) when n is an integer, 
we use the fact that the sum of n independent exponential random variables, each 
having rate A, has this distribution. Hence, if U;,...,U, are independent uniform 
(0, 1) random variables, then 


n 1 1 n 
a =e iL 


has the desired distribution. B 


10.2.2 The Rejection Method 


Suppose that we have a method for simulating a random variable having density 
function g(x). We can use this method as the basis for simulating from the continu- 
ous distribution having density f(x) by simulating Y from g and then accepting the 
simulated value with a probability proportional to f(Y)/g(Y). 

Specifically, let c be a constant such that 


fy) 


——_ <c forally 
g(y) 


We then have the following technique for simulating a random variable having 
density f. 


Rejection Method 


Step 1. Simulate Y having density g and simulate a random number U. 
Step 2. If U < f(Y)/cg(Y), set X = Y. Otherwise return to step 1. 


The rejection method is expressed pictorially in Figure 10.1. We now prove that 
it works. 


Start 


Generate a 
random number 
U 


Generate 
Y~sg 


Figure 10.1 Rejection method for simulating a random variable X having density func- 
tion f. 
The random variable X generated by the rejection method has density function f. 


Proof Let X be the value obtained and let N denote the number of necessary itera- 
tions. Then 


P(X = x}=P{Yn = x} 


=Ply ss = Pa 


cg(Y) 
PY sxU = f(Y) 
K 


cg(Y) 


Example 
2c 


A First Course in Probability 467 


where K = P{U = f(Y)/cg(Y)}. Now, by independence, the joint density function 
of Y and U is 


fo,wW=8sy) O<u<1 
so, using the foregoing, we have 


i 
P{X <x}= - | / g(y) du dy 
ysx 


0 <u =/()/ca) 


1 f®  for/eso) ; 5 

=F ug(y) 
zi.f &(y) dy 
1 x 


Letting X approach oo and using the fact that f is a density gives 


1 


1 [oe 
1=gf foo=zZ 


Hence, from Equation (2.2), we obtain 


P(X <x)= / f(y) dy 


which completes the proof. 


Remarks (a) Note that the way in which we “accept the value Y with probability 
f(Y)/cg(Y)” is by generating a random number U and then accepting Y if U = 
f(Y)/eg(Y). 

(b) Since each iteration will independently result in an accepted value with prob- 
ability P{U = f(Y)/cg(Y)} = K = 1/c, it follows that the number of iterations has a 
geometric distribution with mean c. i 


Simulating a normal random variable 


To simulate a unit normal random variable Z (that is, one with mean 0 and vari- 
ance 1), note first that the absolute value of Z has probability density function 


2 2 
(x) = e*/2 Q<x <0 2.3 
é Vin - 
We will start by simulating from the preceding density function by using the rejection 
method, with g being the exponential density function with mean 1—that is, 


x 


gix)=e* 0<x< ow 
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Now, note that 


e —(x — 1)? 
| 5 = (2.4) 


Hence, we can take c = ,/2e/z; so, from Equation (2.4), 


FR) _ oy | -@ = 1” 
cgay) | 


Therefore, using the rejection method, we can simulate the absolute value of a unit 
normal random variable as follows: 


(a) Generate independent random variables Y and U, Y being exponential with 
rate 1 and U being uniform on (0, 1). 


(b) If U < exp{—(Y — 1)*/2}, set XY = Y. Otherwise, return to (a). 


Once we have simulated a random variable X having Equation (2.3) as its density 
function, we can then generate a unit normal random variable Z by letting Z be 
equally likely to be either X or —X. 

In step (b), the value Y is accepted if U < exp{—(Y — 1)*/2}, which is equivalent 
to —logU = (Y — 1)*/2. However, in Example 2a, it was shown that — log U is 
exponential with rate 1, so steps (a) and (b) are equivalent to 


(a’) Generate independent exponentials Y; and Y, each with rate 1. 
(b’) If Y2 = (Y; — 1)*/2, set ¥ = Yj. Otherwise, return to (a’). 


Suppose now that the foregoing results in Y; being accepted—so we know that Y> 
is larger than (Y, — 1)*/2. By how much does the one exceed the other? To answer 
this question, recall that Y2 is exponential with rate 1; hence, given that it exceeds 
some value, the amount by which Y2 exceeds (Y; — 1)?/2 [that is, its “additional life” 
beyond the time (Y; — 1)*/2] is (by the memoryless property) also exponentially 
distributed with rate 1. That is, when we accept step (b’), not only do we obtain X 
(the absolute value of a unit normal), but, by computing Y2 — (Y; — 1)*/2, we 
also can generate an exponential random variable (that is independent of X) having 
rate 1. 

Summing up, then, we have the following algorithm that generates an exponen- 
tial with rate 1 and an independent unit normal random variable: 


Step 1. Generate Y;, an exponential random variable with rate 1. 

Step 2. Generate Y, an exponential random variable with rate 1. 

Step 3. If Y2 — (Y, — 1)*/2 > 0,set Y= Y2 — (Y, — 1)*/2 and go to step 
4. Otherwise, go to step 1. 
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Step 4. Generate a random number U, and set 
Yi US 
~ l=¥) af Os 


NIP Nie 


The random variables Z and Y generated by the foregoing algorithm are inde- 
pendent, with Z being normal with mean 0 and variance 1 and Y being exponential 
with rate 1. (If we want the normal random variable to have mean yz and variance 
o”, we just take p + oZ.) 


Remarks (a) Since c = \/2e/m * 1.32, the algorithm requires a geometrically dis- 
tributed number of iterations of step 2 with mean 1.32. 

(b) If we want to generate a sequence of unit normal random variables, then we 
can use the exponential random variable Y obtained in step 3 as the initial exponen- 
tial needed in step 1 for the next normal to be generated. Hence, on the average, we 
can simulate a unit normal by generating 1.64(= 2 X 1.32 — 1) exponentials and 
computing 1.32 squares. a 


Simulating normal random variables: the polar method 


It was shown in Example 7b of Chapter 6 that if X and Y are independent unit nor- 
mal random variables, then their polar coordinates R=/ X2+Y?, @= tan-!(Y/X) 
are independent, with R? being exponentially distributed with mean 2 and © being 
uniformly distributed on (0,27). Hence, if U,; and U2 are random numbers, then, 
using the result of Example 2a, we can set 


R = (—2log U,)!/” 
O=2n U2 


from which it follows that 


X = Rcos O = (—2log Uy cos(27 U2) 
Y = Rsin@ = (—2log U;)"/? sinQz U2) (2.5) 


are independent unit normals. a 


The preceding approach to generating unit normal random variables is called 
the Box—Muller approach. Its efficiency suffers somewhat from its need to compute 
the sine and cosine values. There is, however, a way to get around this potentially 
time-consuming difficulty. To begin, note that if U is uniform on (0, 1), then 2U is 
uniform on (0, 2), so 2U — 1 is uniform on (—1,1). Thus, if we generate random 
numbers U; and U2 and set 


VY, =2U, - 1 
Vz =2U2 - 1 


then (V,, V2) is uniformly distributed in the square of area 4 centered at (0, 0). (See 
Figure 10.2.) 

Suppose now that we continually generate such pairs (V;, V2) until we obtain 
one that is contained in the disk of radius 1 centered at (0, 0)—that is, until ve + 
VE = 1. It then follows that such a pair (Vj, V2) is uniformly distributed in the disk. 
Now, let R, © denote the polar coordinates of this pair. Then it is easy to verify that 
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(-1, 1) (1, 1) 


V2+V3=1 


e = (0,0) 
x = (V,, V2) 
Figure 10.2 


R and © are independent, with R being uniformly distributed on (0, 1) and © being 
uniformly distributed on (0,277). (See Problem 10.13.) 


Since 
sin® = Vo = Va 
R V Vet Ve 
cos @ = a Vi 


R (V+ V3 
it follows from Equation (2.5) that we can generate independent unit normals X and 
Y by generating another random number U and setting 

X = (—2log U)'/?V,/R 

Y = (—2log U)'/*V2/R 


sf 2s ‘ 8 
In fact, because (conditional on Ve + vi = 1) Ris uniform on (0, 1) and is inde- 
pendent of 6, we can use it instead of generating a new random number U, thus 


showing that 
5241/21 —2logS 
(—2log R ) ia a 1 


are independent unit normals, where 
= 
S=R =V? + V3 


Summing up, we have the following approach to generating a pair of independent 
unit normals: 
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Step 1. Generate random numbers U, and U). 

Step 2. Set Vj =2U; — 1,V2=2U2 — 1,S=Vi + V3. 
Step 3. IfS > 1, return to step 1. 

Step 4. Return the independent unit normals 


—2logS —2logS 
X= YUY= V: 
\. ee se 


The preceding algorithm is called the polar method. Since the probability that a 
random point in the square will fall within the circle is equal to 7/4 (the area of the 
circle divided by the area of the square), it follows that, on average, the polar method 
will require 4/2 ~ 1.273 iterations of step 1. Hence, it will, on average, require 2.546 
random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 multiplications to 
generate 2 independent unit normals. 


Example Simulating a chi-squared random variable 


ae The chi-squared distribution with n degrees of freedom is the distribution of x? = 


Zz fieee H Zz. where Z;,i = 1,...,m are independent unit normals. Now, it was 
shown in Section 6.3 of Chapter 6 that Zz + Zz has an exponential distribution 


with rate 5. Hence, when n is even (say, m = 2k), von has a gamma distribution 


with parameters (K, 3). Thus, —2log(T [4 U;) has a chi-squared distribution with 
2k degrees of freedom. Accordingly, we can simulate a chi-squared random variable 


with 2k + 1 degrees of freedom by first simulating a unit normal random variable Z 
and then adding Z? to the foregoing. That is, 


k 


Gey = Z — 2log | [vi 
i=1 


where Z, U,,..., U, are independent, Z is a unit normal, and Uj,..., U, are uniform 
(0, 1) random variables. 


10.3 Simulating from Discrete Distributions 


All of the general methods for simulating random variables from continuous dis- 
tributions have analogs in the discrete case. For instance, if we want to simulate a 
random variable Z having probability mass function 


PXSai St, JS lee: Se! 
j 


we can use the following discrete time analog of the inverse transform technique: 
To simulate X for which P{X = x;} = Pj, let U be uniformly distributed over 
(0, 1) and set 
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Example 
3a 


x wWUsP, 
x iP) < USP, + Po 


Since 


jeu j 
P(X =x) =P pe? 2v= Pi = P; 
1 1 
it follows that X has the desired distribution. 


The geometric distribution 


Suppose that independent trials, each of which results in a “success” with probability 
p,0 < p < 1, are continually performed until a success occurs. Letting X denote the 
necessary number of trials; then 


P{X =i}=(1 — p) tp i=l 


which is seen by noting that X = iif the first i — 1 trials are all failures and the ith 
trial is a success. The random variable X is said to be a geometric random variable 
with parameter p. Since 


j-l 
Yo P{X=i}=1— PX >j- 1} 
i=1 
=1 — Pf{first 7 — 1 are all failures} 
sil=( spy" j=1 
we can simulate such a random variable by generating a random number U and then 
setting X equal to that value j for which 


t ath ye ees a Sy 
or, equivalently, for which 
C-<pel= <1. 7) 
Since 1 — U has the same distribution as U, we can define X by 
X =min{j: (1 — p)’ = U} 
min{j: jlog(1 — p) = log U} 


log U 
log(1 — p) 


nin [jes = 


where the inequality has changed sign because log(1—p) is negative [since log(1—p) 
> log 1 = 0]. Using the notation [x] for the integer part of x (that is, [x] is the largest 
integer less than or equal to x), we can write 


Example 
3b 


Example 
3c 
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=i 4 log U 
log(1 — p) g 


As in the continuous case, special simulating techniques have been developed 
for the more common discrete distributions. We now present two of these. 


Simulating a binomial random variable 


A binomial (n, p) random variable can easily be simulated by recalling that it can 
be expressed as the sum of m independent Bernoulli random variables. That is, if 
U,,..., Un, are independent uniform (0, 1) variables, then letting 


jl ifUi <p 
xis 0 otherwise 


nA 
it follows that X = }° Xj is a binomial random variable with parameters n and p. 
i=1 


Simulating a Poisson random variable 


To simulate a Poisson random variable with mean i, generate independent uniform 
(0, 1) random variables U;, U2,... stopping at 


n 
N=min n| | U2 e* 
i=1 


The random variable Y = N — 1 has the desired distribution. That is, if we continue 
generating random numbers until their product falls below e~*, then the number 
required, minus 1, is Poisson with mean i. 

That X = N — 1is indeed a Poisson random variable having mean A can perhaps 
be most easily seen by noting that 


is equivalent to 


n 0 
X = max n|] U; =e where I] U; =1 
i=1 i=1 


or, taking logarithms, to 


n 
X = max n: log Uj =) 
i=1 
or 
n 


X = max n: )\—log U; =i 
i=1 
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However, — log U; is exponential with rate 1, so X can be thought of as being the 
maximum number of exponentials having rate 1 that can be summed and still be 
less than A. But by recalling that the times between successive events of a Poisson 
process having rate 1 are independent exponentials with rate 1, it follows that X is 
equal to the number of events by time A of a Poisson process having rate 1; thus, X 
has a Poisson distribution with mean 4. O 


10.4 Variance Reduction Techniques 


Let X1,...,X, have a given joint distribution, and suppose that we are interested in 
computing 


6 S Elg(X1, Sas ,Xn)| 


where g is some specified function. It sometimes turns out that it is extremely dif- 
ficult to analytically compute 6, and when such is the case, we can attempt to use 
simulation to estimate 6. This is done as follows: Generate ba Mite x having the 
same joint distribution as X1,...,X, and set 


Y1 = 9(Xy,...,X) 


Now let X. o ...,X) simulate a second set of random variables (independent of the 
first set) having the distribution of X1,...,X;, and set 


Y2 = g(X”,...,X) 


Continue this until you have generated k (some predetermined number) sets and so 
have also computed Yj, Y2,..., Yx. Now, Y1,..., Yx are independent and identically 
distributed random variables, each having the same distribution as g(X1,..., Xn). 
Thus, if we let Y denote the average of these k random variables —that is, if 


then 


E[Y] =@ 
E((Y — 6)*] = Var(Y) 


Hence, we can use Y as an estimate of @. Since the expected square of the difference 
between Y and 6 is equal to the variance of Y, we would like this quantity to be as 
small as possible. [In the preceding situation, Var(Y) = Var(Y;)/k, which is usually 
not known in advance, but must be estimated from the generated values Y1,..., Yn.] 
We now present three general techniques for reducing the variance of our estimator. 


Example 
4a 
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10.4.1 Use of Antithetic Variables 


In the foregoing situation, suppose that we have generated Y; and Y2, which are 
identically distributed random variables having mean 6. Now, 


2 4 
Var(Y1) Cov(¥1, Y2) 
= 5) er: 5) 


Var (A) = 1 War(Y}) + Var(Y2) + 2Cov(Y1, Y2)] 


Hence, it would be advantageous (in the sense that the variance would be reduced) 
if Y; and Y> were negatively correlated rather than being independent. To see how 
we could arrange this, let us suppose that the random variables X1,..., X» are inde- 
pendent and, in addition, that each is simulated via the inverse transform technique. 
That is, X; is simulated from F; TU, where U; is a random number and F; is the 
distribution of X;. Thus, Y; can be expressed as 


¥1 = 9(F, '(W),.-.,F (Un) 


Now, since 1 — U is also uniform over (0, 1) whenever U is a random number (and 
is negatively correlated with U), it follows that Y2 defined by 


Y2 =g(F;'(. — Uy),...,F, 10 — Un) 


will have the same distribution as Y;. Hence, if Y; and Y2 were negatively correlated, 
then generating Y2 by this means would lead to a smaller variance than if it were 
generated by a new set of random numbers. (In addition, there is a computational 
savings because, rather than having to generate n additional random numbers, we 
need only subtract each of the previous n numbers from 1.) Although we cannot, 
in general, be certain that Y; and Y2 will be negatively correlated, this often turns 
out to be the case, and indeed it can be proven that it will be so whenever g is a 
monotonic function. 


10.4.2 Variance Reduction by Conditioning 
Let us start by recalling the conditional variance formula (see Section 75.4) 
Var(Y) = E[Var(Y|Z)] + Var(E[Y|Z]) 


Now, suppose that we are interested in estimating E[g(X1,...,Xn)] by simulating 
X = (X1,...,Xn) and then computing Y = g(X). If, for some random variable Z 
we can compute E[Y|Z], then, since Var(Y|Z) = 0, it follows from the preceding 
conditional variance formula that 


Var(E[Y|Z]) = Var(Y) 
Thus, since E[E[Y|Z]] = E[Y], it follows that E[Y|Z] is a better estimator of E[Y] 
than is Y. 


Estimation of z 


Let U; and U2 be random numbers and set V; = 2U; — 1,i = 1,2. As noted in 
Example 2d, (V;, V2) will be uniformly distributed in the square of area 4 centered 
at (0, 0). The probability that this point will fall within the inscribed circle of radius 1 
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centered at (0, 0) (see Figure 10.2) is equal to 7/4 (the ratio of the area of the circle 
to that of the square). Hence, upon simulating a large number n of such pairs and 
setting 


Te 1 if the jth pair falls within the circle 
!~ )0 otherwise 


it follows that J;,j = 1,...,n, will be independent and identically distributed random 
variables having E[J/;] = 1/4. Thus, by the strong law of large numbers, 


+e tlh 
n 


a 
> as N—-©Co 
4 


Therefore, by simulating a large number of pairs (V;, V2) and multiplying the pro- 
portion of them that fall within the circle by 4, we can accurately approximate zr. 

The preceding estimator can, however, be improved upon by using conditional 
expectation. If we let J be the indicator variable for the pair (V1, V2), then, rather 
than using the observed value of J, it is better to condition on V, and so utilize 


EUV] = PV? + VS = 11V} 
= P(V2 =1 — V?|V4} 


Now, 
PIV2 <1 -— ViVi, =v} = PIV? <1 — v7} 
=P{-V1 —- v2 < V2 < v1 — v?} 
=vl1— 1? 
so 


E{I\Vi] = 1 — Vi 


Thus, an improvement on using the average value of J to estimate 7/4 is to use the 


average value of ,/1 — Ve. Indeed, since 


ely —vi]= fo Sv a= [vt eau = B[V — 


where U is uniform over (0, 1), we can generate n random numbers U and use the 
average value of 1 — U? as our estimate of 2/4. (Problem 10.14 shows that this 
estimator has the same variance as the average of the n values, 1 — V2.) 

The preceding estimator of z can be improved even further by noting that the 
function g(u) = V1 — u2,0 < u < 1, is a monotonically decreasing function of u, 
and so the method of antithetic variables will reduce the variance of the estimator 
of E[V1 — U?]. That is, rather than generating n random numbers and using the 
average value of /1 — U? as an estimator of 2/4, we would obtain an improved 
estimator by generating only n/2 random numbers U and then using one-half the 
average of /1 — U2 + /1 — (1 — U)2 as the estimator of 1/4. 

The following table gives the estimates of z resulting from simulations, using 
n = 10,000, based on the three estimators. 
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Method Estimate of z 
Proportion of the random points that fall in the circle 3.1612 
Average value of /1 — U? 3.128448 
Average value of /1 — U2 + /1 — (1 — Uy? 3.139578 


A further simulation using the final approach and n = 64,000 yielded the estimate 
3.143288. a 


10.4.3 Control Variates 


Again, suppose that we want to use simulation to estimate E[g(X)], where X = 
(X1,...,Xn). But suppose now that for some function f, the expected value of f(X) 
is known—say, it is E[f(X)] = . Then, for any constant a, we can also use 


W = 8(X) + alf(X) — py] 
as an estimator of E[g(X)]. Now, 
Var(W) = Var[g(X)] + a’Var[f(X)] + 2a Cov[g(X), f(X)] (4.1) 


Simple calculus shows that the foregoing is minimized when 


_ —Cov[f(X), g(X)] 
a= Var[f (X)] 2) 


and for this value of a, 


Cov[f(X), g(X) 
Var[f (X)] 


Var(W) = Var[g(X)] (4.3) 


Unfortunately, neither Var[f(X)] nor Cov[f(X)], gCX)] is usually known, so we can- 
not in general obtain the foregoing reduction in variance. One approach in practice 
is to use the simulated data to estimate these quantities. This approach usually yields 


almost all of the theoretically possible reduction in variance. 


Summary 


Let F be a continuous distribution function and U a uni- 
form (0,1) random variable. Then the random variable 
F lu) has distribution function F, where F~!(u) is that 
value x such that F(x) = u. Applying this result, we can 
use the values of uniform (0,1) random variables, called 
random numbers, to generate the values of other random 
variables. This technique is called the inverse transform 
method. 

Another technique for generating random variables is 
based on the rejection method. Suppose that we have an 
efficient procedure for generating a random variable from 
the density function g and that we desire to generate a 
random variable having density function f. The rejection 
method for accomplishing this starts by determining a con- 
stant c such that 


an =¢ 
g(x) 


It then proceeds as follows: 


1. Generate Y having density g. 

2. Generate a random number U. 

3. IfU = f(Y)/cg(Y), set X = Y and stop. 
4. Return to step 1. 


The number of passes through step 1 is a geometric ran- 
dom variable with mean c. 

Standard normal random variables can be efficiently 
simulated by the rejection method (with g being exponen- 
tial with mean 1) or by the technique known as the polar 
algorithm. 
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To estimate a quantity 0, one often generates the 
values of a partial sequence of random variables whose 
expected value is 0. The efficiency of this approach is 
increased when these random variables have a small vari- 
ance. Three techniques that can often be used to specify 


Problems 


10.1. The following algorithm will generate a random per- 
mutation of the elements 1,2,...,. It is somewhat faster 
than the one presented in Example 1a but is such that no 
position is fixed until the algorithm ends. In this algorithm, 
P(i) can be interpreted as the element in position i. 


Setk=1. 

Set P(1) = 1. 

If k =n, stop. Otherwise, letk =k + 1. 
Generate a random number U and let 


Step 1. 
Step 2. 
Step 3. 
Step 4. 


P(k) = P({kU] + 1) 
P(kU] + 1) =k 


Go to step 3. 


(a) Explain in words what the algorithm is doing. 


(b) Show that at iteration k—that is, when the value of 
P(k) is initially set— P(1), P(2),..., P(k) is a random per- 
mutation of 1,2,...,k. 

Hint: Use induction and argue that 


Pxiit, i2,... 51-1, ky ij, ... ,ik-2, 0} 


= Px_1{h,h2,.. «5 tK-2} 


ea Bt 1 
4-14, Uj,-- k 


1 
= by the induction hypothesis 


10.2. Develop a technique for simulating a random vari- 
able having density function 


—o <x <0 
0 <x <o 


2X 
fe) = | or 


10.3. Give a technique for simulating values from the dis- 
tribution having the probability density function 


fax)= 


|x|, x € [—2,2] 


1 
4 


NI Re 


random variables with mean @ and relatively small vari- 
ances are 


1. the use of antithetic variables, 
2. the use of conditional expectations, and 
3. the use of control variates. 


10.4. Given the distribution function defined by 


0 g< i 

1 

3 1=z<2 
F(Z) = 1 2-4 

= 3 

3 1° 45 a 

2 1 

a ~(1 — e% =>3 

3 + 36 CO) 8 


how would random numbers be generated in a way so as 
to follow the probabilistic pattern above? 


10.5. F is the function defined by 


F@=e*® z=0 


Show that it is a distribution function. Obtain the inverse 
transformation that may be used to generate random num- 
bers following distribution F. 


10.6. Given distributions 


1 
1 + exp(—x2"+1) 


Oo = X= 66 


F(x) = 


give a simulation method for each of the following specifi- 
cations. 


(a)n=0 
(b) n =2 


10.7. Let F be the distribution function 


F(x)=x" O<x<1 

(a) Give a method for simulating a random variable having 
distribution F that uses only a single random number. 

(b) Let Uj,.. 
Show that 


.,U, be independent random numbers. 
P{max(U},.. 


3U0,) = x} =x" 


(c) Use part (b) to give a second method of simulating a 
random variable having distribution F. 


10.8. Independent random variables V;,...,V, share the 


distribution function 


How would you simulate values from max V; and from 


. 1Sisn 
min V;? 
1SisSn 
10.9. Suppose we have a method for simulating random 
variables from the distributions F; and F>. Explain how to 
simulate from the distribution 


F@)=pRh@) + (1 - phe 0<p<1 


Give a method for simulating from 


= i= 4 ey eee 
» 2) 
ie — e 3%) 4 ; x1 


10.10. In Example 2c we simulated the absolute value of a 
unit normal by using the rejection procedure on exponen- 
tial random variables with rate 1. This raises the question 
of whether we could obtain a more efficient algorithm by 
using a different exponential density—that is, we could use 
the density g(x) = Ae-**. Show that the mean number 
of iterations needed in the rejection scheme is minimized 
when A = 1. 


10.11. How can the rejection method be used to simulate 
random values with probability density function 


15 
f= ar =*) DS2e22 


using reference density g(x) = 5? 


Self-Test Problems and Exercises 
10.1. The random variable X has probability density 
function f(x) = Ce" 


(a) Find the value of the constant C. 
(b) Give a method for simulating such a random variable. 


O0O<x<l 


10.2. Give an approach for simulating a random variable 
having probability density function 


f= 30(x2 — 2x7 + x4) O<x <1 
10.3. Give an efficient algorithm to simulate the value of a 
random variable with probability mass function 


Pi = .15 p2 = .2 p3 = 35 p4 = .30 


Reference 


A First Course in Probability 479 


10.12. Explain how you could use random numbers to 


approximate i k(x) dx, where k(x) is an arbitrary func- 
tion. 
Hint: If U is uniform on (0, 1), what is E[K(U)]? 


10.13. Let (X, Y) be uniformly distributed in the circle of 
radius | centered at the origin. Its joint density is thus 


1 
fen=— 022% +7 =1 
1 
Let R = (X2 + Y*)!/2 and @ = tan—!(Y/X) denote 
the polar coordinates of (X, Y). Show that R and @ are 
independent, with R? being uniform on (0, 1) and @ being 
uniform on (0,277). 


10.14. In Example 4a, we showed that 


Ed - V4)'?] = Bd - v7] =F 


when V is uniform (—1,1) and U is uniform (0, 1). Now 
show that 


Var[(1 — V2)!/2] = varf(l — U?)!/2] 
and find their common value. 


10.15. (a) Verify that the minimum of (4.1) occurs when a 
is as given by (4.2). 
(b) Verify that the minimum of (4.1) is given by (4.3). 


10.16. Let X be a random variable on (0, 1) whose density 
is f(x). Show that we can estimate ic g(x) dx by simulat- 
ing X and then taking g(X)/f(X) as our estimate. This 
method, called importance sampling, tries to choose f sim- 
ilar in shape to g, so that g(X)/f(X) has a small variance. 


10.4. If _X is a normal random variable with mean p and 
variance o7, define a random variable Y that has the same 
distribution as X and is negatively correlated with it. 


10.5. Let X and Y be independent exponential random 
variables with mean 1. 


(a) Explain how we could use simulation to estimate 
Efe*"}]. 

(b) Show how to improve the estimation approach in part 
(a) by using a control variate. 
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p?/(1 —2p + 2p?) 82. .5550 86. .5;.6;.8 87. 9/19; 6/19: 
4/19; 7/15; 53/165; 7/33 91.9/16 94, 97/142; 15/26; 33/102 


95.41 — 1 — py") 96. py(2 — p2)/2s p2/2 — pa) 


6.1/2 72/3 8.1/2 


13. 504; .3629 
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Chapter 4 

1. p(4) = 3/95; p(3) = 2/19; p(2.50) = 16/95; p(2.20) = 
6/95; p(2) = 1/19; p(1.50) = 4/19; p(1.20) = 3/38; p(1) = 
14/95; p(0.70) = 12/95; p(0.40) = 3/190 4.5/6 5S.n — 2i; 
1=0! c.cght 6. p(3)=p(—3)=1/8; = p()=p(—1)=3/8 11b. 
logigG +1) 12. p(4)=1/16; p(3)=1/8; p(2)=1/16; p(0)=1/2; 
p(—i)=p(); p(0)=1 13. E(X) = $880.50 14. p(0)=1/2; 
p(1)=1/6; p(2)=1/12; p(3)=1/20; p(4)=1/5 16. k/(k + DY}, 
1s k <n, 1/n!, k=n 17.1/2; 1/4; 1/2 19. 1/4; 3/8; 
1/8; 1/8; 1/8; all other probabilities are 0 20. .5918; no; 
—.108 21. 39.28; 37 24. p=11/18; maximum=23/72 
25. .225; 385 27 A(p + 1/10) 28.55/21 31. p* 
32.11 — 10(.9)10 33.3 35. .8235; .8218 38. 82.2; 84.5 
40. .3125 41..0197 43.2.8; 1476 46.3 52. 17/12; 
99/60 53. 7/40;7/120 54. .7408;.0036 56.1 — e7 6:1 — 
e719.18 57, 0054; .9354 58. .0125 59. n = 676 log(4) 
63. 8886 64. .9596 66. .0183; .3712 68. .3935; .2293; 
3935 69. 2/(2n — 1);2/(Q2n — 2);e7! 70. 2/n; (Qn — 3)/ 
(n — 177; eo 2 7h e~ 10e™ 723.p + (1 — pye* 
74, 1500; .1012 76. 5.8125 77. 32/243; 4864/6561; 
160/729; 160/729 81. 5(6"-}y/11” 84. 3/10; 5/6; 75/138 
85. .757 86.1.9 89. .1793; 1/3; 4/3 


Chapter 5 

2..6109 3.no; yes 4.1/2, 8999 5.1 — (.01)!/5 
6.4,0,00 7% —43/14; 5/14 8.8930 10, 2/3; 2/3 
11.2/5 13. 3/5; 15/21 15. .7791; .2417; .4389; .1780; 


6778 16. (.9938)!9 17, 315; 136 = 18. 7.1053; 6.5789 
19. (In4)/2 20. .9803; .0015; .7098 22. .9476 23. .5178; 
2398 26. .0606; .0525 28. .9545 29. .9993 32. e~ 4/3; 
e~7/3 33, V has Pareto distribution with minimum param- 
eter a and index parameter A 34. el: 1/3 35. Weibull 
random variable with parameters v = 0,a=1,6=2 39. 3/5 
41.a=7/9;b=56/9 42. 1/c(b—a) 


Chapter 6 
2. (a) 14/39; 10/39; 10/39; 5/39 (b) 84; 70; 70; 70; 40; 
40; 40; 15 all divided by 429 3. 15/26; 5/26; 5/26; 1/26 


4. (a) 64/169; 40/169; 40/169; 25/169 6. .20, .30, .30, 
20; 18, .30, .31, .21; 2.5; 2.55; 1.05; 1.0275 7. p(i,j)=p” 

ail ayia e ee aan 
d= py 8c = 4/3; goad = 16/45 9 <a: 
2x(1—e7*) . d-y-)e” : : 
14-0 for0O = x rn ie “e-1G—e) for0 = y= fi: .8799; 
7327, 3188 10.22: 1913 12. .8243 13. 1/6; 1/2 


24a 


15.7/4 16.n(1/2)"-) 17..0025 19. 30) <x <1: 


‘ B(3,2) ° 
2y" — 3y* +1 . 3/5. . 1.a+2 : 
~SBG2) 9 <y< is 3/5; 3 21. 13 ae 22. no; 
13 23, 1/2; 2/3; 120; 118 2.e ji! 28, he“: 


1-3e-2 29, .954; 4502 30. 1766; .9973 31. 0829; 
3766 = 32. .0625;.1587 33. P(X, + X2 < 10000); P(X, > 


5100) 34a — 2". 35. (a) .6572; (b) yes; (d) .2402 
36. .9346 37. .04; 3758 39. 5/13; 8/13 40. 1/6; 5/6; 
1/4; 3/4 45. (y + 1)2xeXOTD; xe; e446. 12; 
qa y <x <1 50.((L—2d)/L)> 51. .79297 
§52.1-(1-—p)™;[1-(—p)"|" 56.r/x 572 r 60. (a) 
u/(v + 1)? 

Chapter 7 

1. 52.5/12 2.324; 1988 3.1/3; 5/9; 1 4.2/3; 5/9; 
16/9 5.31188 6.35 7722; .02;.76 81 10.6; 
0 11.27 — 1p — p)_ 12. Gn? — n)/(4n — 2), 
3n2/(4n — 2) Wdm/ — p) 15.12 18.1 
21. 4.815; 44.3375 22.14.7 23. 147/110 —.26. I /na; 
W(n + Da) 29. BF; 12; 4; BB 3.175/6 33. a? — 
dab + 5b2;b2 34, 2182; 2407 Ss 35. 4; 42; 14.7 
36. —0.16n. 37, -1/12 38. 102/45, 84/45; .5956, 
6489; .0771, 1436 40. .1033 43. 5952; .4732; .008 
44, 100/19; 16,200/6137; 10/19; 3240/6137 — 47. 1/,/2; 
3/15. 49. 1/(n — 1) 50.35/18; 7; 3 51. 2.8078 
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—1 
ae E a ia E = rte | Gey 
(kK +2) 55.12 56.8 58. N(1 — e~!9/N) 59, 30000 
64.7 +1, Yeo Apia — pie“ 44 + 8/6; 1 + 
pid — p)s(1 — pye/(e? — p) 65. 1/2; 1/3; 
I/(nn +1) 68. -96/145 70. 4.2; 5.16 71. 218 
72. x[1 + 2p —1)?}" 74. 1/2; 1/16; 2/81 75. 1/2, 1/3 


77. 1/i: [ii + 1)]~!;00 78. uw; 1 + 07; yes;o72 84, 8186; 
7055 

Chapter 8 

1. =19/20 2. 15/17; =3/4; =10 3.23 = 4. <4/3; 8428 
5..1416 6..9431 7.3085 8.6932 9.19 10. 77715 
11. =.057 13. .0162; .0003; .2514; .2514  14.n = 23 


16. .0206; .0250; .254 18.=.2 23. .769; .357; .4267; .1093; 
112184 24. answer is (a) 


Chapter 9 


1. 1/9; 5/9 3. .9953;  .9735; .9098; .7358 
14. 2.585; 5417; 3.1267 15. 5.5098 


10. (b)1/6 


SOLUTIONS TO SELF- TEST PROBLEMS AND EXERCISES 


Chapter | 


I.1. (a) There are 4! different orderings of the letters C, D, 
E, F. For each of these orderings, we can obtain an order- 
ing with A and B next to each other by inserting A and B, 
either in the order A, B or in the order B, A, in any of 5 
places, namely, either before the first letter of the permuta- 
tion of C, D, E, F, or between the first and second, and so on. 
Hence, there are 2 - 5 - 4! = 240 arrangements. Another way 
of solving this problem is to imagine that B is glued to the 
back of A. Then there are 5! orderings in which A is immedi- 
ately before B. Since there are also 5! orderings in which B is 
immediately before A, we again obtain a total of 2 - 5! = 240 
different arrangements. 

(b) There are 6! = 720 possible arrangements, and since 
there are as many with A before B as with B before A, there 
are 360 arrangements. 

(c) Of the 720 possible arrangements, there are as many that 
have A before B before C as have any of the 3! possible 
orderings of A, B, and C. Hence, there are 720/6 = 120 pos- 
sible orderings. 

(d) Of the 360 arrangements that have A before B, half will 
have C before D and half D before C. Hence, there are 180 
arrangements having A before B and C before D. 

(e) Gluing B to the back of A and D to the back of C yields 
4! = 24 different orderings in which B immediately follows 
A and D immediately follows C. Since the order of A and 
B and of C and D can be reversed, there are 4 - 24 = 96 
different arrangements. 

(f) There are 5! orderings in which E is last. Hence, there are 
6! — 5! = 600 orderings in which E is not last. 


1.2. 3! 4! 3! 3!, since there are 3! possible orderings of coun- 
tries and then the countrymen must be ordered. 

1.3. (a) 10-9 - 8=720 

(b)8-7-64+2-3-8-7= 672. The result of part (b) follows 
because there are 8 - 7 - 6 choices not including A or B and 
there are 3 - 8 - 7 choices in which a specified one of A and 
B, but not the other, serves. The latter follows because the 
serving member of the pair can be assigned to any of the 3 
offices, the next position can then be filled by any of the other 
8 people, and the final position by any of the remaining 7. 
()8-7-64+3-2-8=384. 

(d)3- 9. 8=216. 

(e)9-8-74+9-8=576. 


1.6. There are (; = 35 choices of the three places for the 


letters. For each choice, there are (26)3( 10)4 different license 
plates. Hence, altogether there are 35 - (26)3 : (10)4 different 
plates. 

1.7. Any choice of r of the n items is equivalent to a choice 
ofn — r,namely, those items not selected. 
1.8.(a)10-9-9---9=10- 9-1 


b) . 9"—! since there are "| choices of the i places to 
i i P 


put the zeroes and then each of the other n — i positions can 
be any of the digits 1,...,9. 


19. (a) i) 
“() 
“(QQ 


(d) n> 


(e) (*) =% (') + 3n2(n — 1) + 3 


1.10. There are 9 - 8 - 7 - 6 - Snumbers in which no digit is 
repeated. There are () - 8 - 7 - 6 numbers in which only 


one specified digit appears twice, so there are 9 -8-7-6 


5 
2 
numbers in which only a single digit appears twice. There are 
7: on numbers in which two specified digits appear twice, 


so there are 7: aot numbers in which two digits appear 


2 
twice. Thus, the answer is 


5 9 5! 
g-e7-6.s49()).a.7-64 (37-55 


I.11. (a) We can regard this as a seven-stage experiment. 
First choose the 6 married couples that have a representative 
in the group, and then select one of the members of each of 
these couples. By the generalized basic principle of counting, 
there are (2 )26 different choices. 


(b) First select the 6 married couples that have a representa- 
tive in the group, and then select the 3 of those couples that 
; 10) (6 10! 
are to contribute a man. Hence, there are (6)(G) = aii 
different choices. Another way to solve this is to first select 
3 men and then select 3 women not related to the selected 
men. This shows that there are @) (4) = wii different 
choices. 


8\ (7 8\ (7 ‘ 
1.12. (;) (; + () (3 = 3430. The first term gives the 


number of committees that have 3 women and 3 men; the 
second gives the number that have 4 women and 2 men. 


1.13. (number of solutions of xy + --- + x5 = 4) (number 
of solutions of xj + --- + x5 = 5) (number of solutions of 


a ++as=9-(f) ({) (2) 


j = 


1.14. Since there are 
n—-1 


positive vectors whose sum 
n—-1 


j-1 
= n—-1 


is the number of subsets of size n from the set of numbers 
{1,...,} in which j is the largest element in the subset. Con- 
k —_ 

sequently, }* J 
jan \O 1 
size n from a set of size k, showing that the preceding answer 


kK fj _ 
is j, there must be )> (/ such vectors. But 


is just the total number of subsets of 


is equal to 


1.15. Let us first determine the number of different results in 
n 


k 
of size k and k! possible orderings of their scores, it follows 


which k people pass. Because there are different groups 


n 


that there are ( , 


k! possible results in which k people pass. 


n 
Consequently, there are }° (;) k! possible results. 
k=0 


1.16. The number of subsets of size 4 is (7?) = 4845. Because 
the number of these that contain none of the first five ele- 
ments is ea) = 1365, the number that contain at least one is 
3480. Another way to solve this problem is to note that there 
are (?) Ge) that contain exactly i of the first five elements 
and sum this for i = 1,2,3,4. 


1.17. Multiplying both sides by 2, we must show that 

n(n — 1)=k(k — 1) + 2k(v — kK) + (n — k(n — k — 1) 
This follows because the right side is equal to 

Pi 229 4k61 + 2e— 8 2 414 ee SD 


For a combinatorial argument, consider a group of n items 
and a subgroup of k of the n items. Then (5) is the number 
of subsets of size 2 that contain 2 items from the subgroup of 
size k, k(n — k) is the number that contain 1 item from the 
subgroup, and () is the number that contain 0 items from 
the subgroup. Adding these terms gives the total number of 
subgroups of size 2, namely, (5). 
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1.18. There are 3 choices that can be made from families 
consisting of a single parent and 1 child; there are3-1-2=6 
choices that can be made from families consisting of a single 
parent and 2 children; there are 5 - 2 - 1 = 10 choices that 
can be made from families consisting of 2 parents and a sin- 
gle child; there are 7 - 2 - 2 = 28 choices that can be made 
from families consisting of 2 parents and 2 children; there are 
6-2-3 = 36 choices that can be made from families consist- 
ing of 2 parents and 3 children. Hence, there are 83 possible 
choices. 


1.19. First choose the 3 positions for the digits, and then put 
in the letters and digits. Thus, there are (5) -26-25.-24. 
23.-22-10-9- 8 different plates. If the digits must be 
consecutive, then there are 6 possible positions for the digits, 
showing that there are now 6-26-25. 24-23-22-10-9-8 
different plates. 


1.20. (a) Follows since atom is the number of x letter per- 
Xybxp 
mutations of the values 1,...,7 in which i appears x; times, 


Vey en 
Ove deenat at HO Pe Par 


1.21. — 1)" =1 —- (7) + G) +... + (D"(), giving 
that (7) — (6) +... + Dt) =1. 


Chapter 2 
2.1. (a)2-3-4=24 
(b)2-3=6 
()3-4=12 


(d) AB = {(c, pasta, i), (c, rice, i), (c, potatoes, i)} 

(e) 8 

(f) ABC = {(c, rice, i)} 

2.2. Let A be the event that a suit is purchased, B be the 


event that a shirt is purchased, and C be the event that a tie 
is purchased. Then 


P(AUBUO) = .224 304 .28 — .11-— .14-.10+4+ .06 = 51 


(a) 1 — 51 = .49 
(b) The probability that two or more items are purchased is 


P(AB U AC U BC) = .11 + .14 4+ .10 — .06 — .06 
— .06 + .06 = .23 


Hence, the probability that exactly 1 item is purchased is 
1 — 23 = .28. 

2.3. By symmetry, the 14th card is equally likely to be any 
of the 52 cards; thus, the probability is 4/52. A more formal 
argument is to count the number of the 52! outcomes for 
which the 14th card is an ace. This yields 


_4-51-50---2-1 4 
~ (52)! ~ 52 


P 
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Letting A be the event that the first ace occurs on the 14th 
card, we have 


48 . 47..-36- 4 


P(A) = 
(A) 52 - 51---40 - 39 


= .0312 


2.4. Let D denote the event that the minimum temperature 
is 70 degrees. Then 


P(A U B) = P(A) + P(B) — P(AB) = .7— P(AB) 
P(C U D) = P(C)+ P(D) — P(CD) = .2+ P(D) — P(DC) 


Since A U B= CU Dand AB = CD, subtracting one of the 
preceding equations from the other yields 


0=.5 — PWD) 


or P(D) =.5. 
52 - 48 - 44 - 40 


25/A8) oy 51. 50 49 
52 - 39 - 26 - 13 
(b) ae eee 1055 
52 - 51 - 50 - 49 
2.6. Let R be the event that both balls are red, and let B be 
the event that both are black. Then 
3-4 3-6 


= .6761 


=1/2 


1 — -8 
2.1. (a) 7 = 1.3 x 10 
8 
8\ (32 
TVA _ 
(b) 7“ = 3.3 x 10 


n 
2.9. Let S = |) A;, and consider the experiment of ran- 


i=1 
domly choosing an element of S. Then P(A) = N(A)/N(S), 
and the results follow from Propositions 4.3 and 4.4. 


2.10. Since there are 5! = 120 outcomes in which the 
position of horse number 1 is specified, it follows that 
N(A) = 360. Similarly, N(B) = 120, and N(AB) = 2 - 
4! = 48. Hence, from Self-Test Problem 2.9, we obtain 
N(A U B) = 432. 


2.11. One way to solve this problem is to start with the com- 
plementary probability that at least one suit does not appear. 
Let Aj, i = 1,2,3,4, be the event that no cards from suit i 
appear. Then 


4 
P (U a =) P(A) — )> Yo Pt; Aj) 
i=1 i j tisj 
+ +++ — P(A,A7A3A4) 
26 
(3) 5 
2} (52 
5 


The desired probability is then 1 minus the preceding. 
Another way to solve is to let A be the event that all 4 suits 
are represented, and then use 


P(A) = P(a,n,n,n,o) + P(tn,n,n,o,n) + P(n,n,o,n,n) 
+ P(n,o,n,n,n) 


where P(n, n,n, 0, n), for instance, is the probability that the 
first card is from a new suit, the second is from a new suit, the 
third is from a new suit, the fourth is from an old suit (that is, 
one which has already appeared) and the fifth is from a new 
suit. This gives 
52-39 - 26-13 - 48 + 52 - 39 - 26 - 36.- 13 
52-51-50. 49 - 48 

52 - 39. 24-26-13 + 52-12-39. 26- 13 
= 52. 51-50-49. 48 
52 - 39 - 26 - 13(48 + 36 + 24 4+ 12) 

52-51-50. 49 - 48 


P(A) = 


= .2637 


2.12. There are (10)!/2> different divisions of the 10 play- 
ers into a first roommate pair, a second roommate pair, and 
so on. Hence, there are (10)!/ (5!25) divisions into 5 room- 


mate pairs. There are ways of choosing the front- 


6 4 
2 2 
court and backcourt players to be in the mixed roommate 
pairs and then 2 ways of pairing them up. As there is then 
1 way to pair up the remaining two backcourt players and 
4!/(2!22) = 3 ways of making two roommate pairs from 


the remaining four frontcourt players, the desired probabil- 


ity is 
6 4 
(5) (2) oo 
-—____1_____ = 5714 


P{2 mixed pairs} = 
pansy (10)!/(5!25) 

2.13. Let R denote the event that letter R is repeated; simi- 

larly, define the events E and V. Then 


P{same letter} = P(R) + P(E) + P(V) 
21 31 11 3 


58+ 781 73738 


Cc 
i-1 
2.14. Let By = Ay, Bj = Aj ( 4) i> 1. Then 
jou 


where the final equality uses the fact that the B; are mutually 
exclusive. The inequality then follows, since B; C Aj. 


2.15. 


(69 


(oe) lo) 
r(Aha =1-P (A 
i=1 i=1 

[oe 
=1-P{\|J4¢§ 
i=1 
CO 
=1- >) PAs) 
i=1 


=1 


2.16. The number of partitions for which {1} is a subset is 
equal to the number of partitions of the remaining n — 1 ele- 
ments into k — 1 nonempty subsets, namely, T,_1(n — 1). 
Because there are 7;,(m — 1) partitions of {2,..., — 1} into 
k nonempty subsets and then a choice of k of them in which 
to place element 1, it follows that there are kT,(n — 1) par- 
titions for which {1} is not a subset. Hence, the result follows. 


2.17. Let R, W, B denote, respectively, the events that there 
are no red, no white, and no blue balls chosen. Then 


P(RU WU B)=P(R) + P(W) + P(B) — P(RW) 


— P(RB) — P(WB) + P(RWB) 
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As) G3) 0) @) 
() (2) (2) (@) 
(s)_ (3) 

(3) (3) 


= 0.2933 


Thus, the probability that all colors appear in the chosen sub- 


set is approximately 1 — 0.2933 = 0.7067. 
2.18. (a) 82654 = 2 
aed 17-16-15-14-13 221 


(b) Because there are 9 nonblue balls, the probability is 
9-8:7-6-5  _ 9 

17-16-15-14-13 ~ 442° 

(c) Because there are 3! possible orderings of the different 


colors and all possibilities for the final 3 balls are equally 
: wee, Bhd 
likely, the probability is 74755 = 77- 
(d) The prgpepilty that the red balls are in a specified 4 spots 


: 4.3-2-1 
iS 77461514: Because there are 14 possible locations of 


the red balls uote they are all together, the probability is 
14-4.3-2-1 __ 
17-16-15-14 ~~ 170: 


2.19. (a) The probability that the 10 cards consist 
of 4 spades, 3 hearts, 2 diamonds, and 1 club is 


CEE), 


of the suits to have 4,3,2, and 1 cards, respectively, it follows 
aE) 
(io) 
4 


(b) Because there are 3) = 6 choices of the two suits that 


Because there are 4! possible choices 


that the probability is 


are to have 3 cards and then 2 choices for the suit to have 4 
3 )\ 3 )\4 

52 

10 
2.20. All the red balls are removed before all the blue ones 
if and only if the very last ball removed is blue. Because all 
30 balls are equally likely to be the last ball removed, the 


probability is 10/30. 


cards, the probability is 


Chapter 3 


3.1. (a) P(mo aces) = ( | / & 


35 
(2) 
(b) 1 — P(no aces) — (2) 
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3 36 
Pi _\i 13 -i 
(c) P(i aces) = = Tao 
13 


3.2. Let L; denote the event that the life of the battery is 
greater than 10,000 X i miles. 

(a) P(L2|L1) = P(L1L2)/P(L1) = P(L2)/P(Ly) = 1/2 

(b) P(L3|L1) = P(LyL3)/P(L1) = P(L3)/P(Ly) = 1/8 


3.3. Put 1 white and 0 black balls in urn one, and the remain- 
ing 9 white and 10 black balls in urn two. 


3.4. Let T be the event that the transferred ball is white, 
and let W be the event that a white ball is drawn from urn 
B. Then 


2 P(W\T)P(T) 
dee ae P(WIT)P(T) + P(W|T°)P(T*) 
= (2/7)(2/3) = 4/5 
(2/7)(2/3) + (1/7)(1/3) 
3.5. (a) P(E|E U F) = OE) = pO 


since E(E U F) = Eand P(E U F) = P(E) + P(F) because 
E and F are mutually exclusive. 


sp [Joo | _ PUE(U2, Ei) _ P(E) 
(b) PE} UZ) Fi) = SORTED) = FE, PED 


3.6. Let B; denote the event that ball i is black, and let 
R; = Be. Then 


P(R|B1)P(B1) 
P(R2|By)P(B1) + P(R2|R1)P(R1) 
r/[b +r + o][b/o + 7] 


P(By|R2) = 


~[r/b+rt+ Olb/b+ nj+let+ o/b +r4+o][r/b + vn] 
b 
“b4+rd+e 


3.7. Let B denote the event that both cards are aces. 


P{B, yes to ace of spades} 


a) P{B t f spades} = 
Cay PUP estrada erepades) P{yes to ace of spades} 


@)G) GC) 
(7) © (2) 


= 3/51 


(b) Since the second card is equally likely to be any of the 
remaining 51, of which 3 are aces, we see that the answer in 
this situation is also 3/51. 

(c) Because we can always interchange which card is consid- 
ered first and which is considered second, the result should 
be the same as in part (b). A more formal argument is as 


follows: 


P{B, second is ace} 


P{B|second is ace} = Pisscondis ace} 


P(B) 
= P(B) + P{first is not ace, second is ace} 
(4/52)(3/51) 
~ (4/52)/51) + (48/52)(4/51) 
= 3/51 


P(B) 
P{at least one} 
(4/52)(3/51) 
~ T — (48/52)(47/5) 
= 1/33 


(d) P{Blat least one} = 


3.8, PIE) _ P(HE) _ PUH)PCE|H) 
5. DGIE) = PIGE) = PG)PE\G) 


Hypothesis H is 1.5 times as likely. 


3.9. Let A denote the event that the plant is alive and let W 
be the event that it was watered. 


(a) P(A) = P(A|W)P(W) + P(A|W°)P(W*) 
= (.85)(.9) + (.2)(.1) = .785 


P(AS|W°)P(W*) 


b cyAc) — 

(b) P(W°|A‘) PAD 
_ (8)(1) _ 16 
~ 215 43 


3.10. (a) Let R be the event that at least one red ball is cho- 
sen. Then 


(3) 

(*) 
(b) Let Gz be the event there are exactly 2 green balls cho- 
sen. Working with the reduced sample space yields 


10 12 
P(G2IR®) = (2)(4) ) ) 
(3) 
3.11. Let W be the event that the battery works, and let C 
and D denote the events that the battery is a type C and that 
it is a type D battery, respectively. 
(a) POW) = PWWIC)P(C) + P(W|D)P(D) = .7(8/14) + 
.4(6/14) = 4/7 
(by P(CIWS) = BAGH _ POV _ 38/1 _ 
3.12. Let L; be the event that Maria likes book i,i = 1,2. 
Then 


P(L{L2) _ P(L{L») 


Ch ae 
P(L9|L5) = PLS) = A 


Using that Lz is the union of the mutually exclusive events 
L1Lz and L{ Ly, we see that 


5 = P(Ly) = P(Ly Ly) + P(LSLy) = 4 + P(LSLy) 
Thus, 
4 
c — 
P(Z1L4) Sa 28 


3.13. (a) The sample space of interviews that have been held 
in sequence can be represented as 


Q= {(x, y, Z) 1X,Y,Z€ {I, G1, Go} 


where J is the innocent person and G1, G2 are the two guilty 
persons. 


Because the probability of each possible sequence of inter- 
views is 1/6, the probability that the first interviewee is guilty 
is 2/3. 

(b) Let event one where the interviewee has been identified 
as guilty after the first interview correspond to A. Then 


A = {(G1,I, G2), (G1, G2, D), (G2, 1, G1), (G2, G1, D} 


has probability 4/6. Let event two where the second intervie- 
wee is innocent correspond to B. Then 


B= {(G1,1, G2), (G2,1, Gy) 
has probability 1/3. Also, A N B = B. Thus, 


P[AN B] 1/3 1 
P[A] — 2/32 


P[B|A] = 


The probability that the next person to be interviewed is 
innocent is 1/2. 


3.14. Let H be the event that the coin lands heads, let T;, be 
the event that B is told that the coin landed heads, let F be 
the event that A forgets the result of the toss, and let C be the 
event that B is told the correct result. Then 


(a) P(T,) = P(T,|F)P(F) + P(Ty|F PS) 
= (.5)(.4) + P(H)(.6) 
= .68 


(b) P(C) = P(C|F)P(F) + P(C|F°)P(F*) 
= (.5)(.4) + 1(.6) = .80 


P(AT, 
(©) PIT) = Bor 
Now, 


P(HT)) = P(HT|F)P(F) + P(HT,|FO)PCF*) 
= P(A|F)P(T)|HF)P(P) + P(H)P(F‘) 
= (.8)(.5)(.4) + (.8)(.6) = .64 


giving the result P(H|T;,) = .64/.68 = 16/17. 
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3.15. Since the black rat has a brown sibling, we can con- 
clude that both of its parents have one black and one 
brown gene. 

(a) P(2 black|at least one) = Prat — ao 7 iy = i 

(b) Let F be the event that all 5 offspring are black, let By be 
the event that the black rat has 2 black genes, and let By; be 
the event that it has 1 black and 1 brown gene. Then 


P(B)IF) = P(F|Bz)P(B2) 
2!" ~ P(FIBz)P(B2) + PCF|By)P(B}) 
(1)(1/3) _ 16 


~ (d/3) + 0/25@/3) 17 


3.16. Let F be the event that a current flows from A to B, 
and let C; be the event that relay i closes. Then 


P(F) = P(F|C)p, + PUFIC()G — pi) 


Now, 
P(F|C) = P(C4 U CxC5 U C3Cs) 
=P4 + P2P5 + P3P5 — P4P2Ps5 
— P4P3P5 — P2P3P5 + P4P2P5P3 
Also, 


P(F|C}) = P(CyC5 U CoC3C4) 
= P2P5 + P2P3P4 — P2P3P4P5 


Hence, for part (a), we obtain 
P(F) = pi(p4 + P2Ps + P3Ps — P4P2Ps 
— P4P3P5 — P2P3P5 + P4P2P5P3) 
+ (1 — p1)p2(s5 + p3p4 — P3P4Ps) 


For part (b), let gj = 1 — p;. Then 


P(C3|F) = P(F|C3)P(C3)/P(F) 
=pa[l — PWC)C, U CoCr) 
= p31 — 4192 — 449s + 91929495)/P(F) 


3.17. Let A be the event that component 1 is working, and 


let F be the event that the system functions. 
_ P(AF) _ P(A) _ 172 _2 
(a) PAI) = Bay = PR = Rape? = 3 


where P(F) was computed by noting that it is equal to 1 
minus the probability that components 1 and 2 are both 
failed. 


— PAF) _ PIA)P(A) _ __GB/4)0/2)_ _ 3 
(b) P(AIF) = ap PR) W530 /D3 = 4 
where P(F) was computed by noting that it is equal to the 
probability that all 3 components work plus the three proba- 


bilities relating to exactly 2 of the components working. 


3.18. If we assume that the outcomes of the successive spins 
are independent, then the conditional probability of the next 
outcome is unchanged by the result that the previous 10 spins 
landed on black. 
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3.19. Condition on the outcome of the initial tosses: 
P(A odd) = P;(1 — Po) — P3) + A — Py) PoP3 

+ P,P2P3P(A odd) 
+ (1 — Py) — Po) 


P3)P(A odd) 
so, 


P,(_ — P29) — P3) + (A — Py) P2P3 


P(A odd) = 
( ) Py + Po + Ps P,P P,P3 P7P3 


3.20. Let A and B be the events that the first trial is larger 
and that the second is larger, respectively. Also, let E be the 
event that the results of the trials are equal. Then 


1 = P(A) + P(B) + P(E) 
But, by symmetry, P(A) = P(B): thus, 
n 
1 —-— Sp; 
1 — P(E) _ i=1 


eS — 2 


Another way of solving the problem is to note that 
P(B) = >» > P({first trial results in 7, second trial results in j} 
i j>i 
= pi 
i j>i 


To see that the two expressions derived for P(B) are equal, 


observe that 
n n 
1=) pi dP) 
i=l j=l 


=) DP 
ij 
=i + ain) 


i j#i 
2 
= 2D pn 
i i j>i 
3.21. Let E = {A gets more heads than B}; then 
P(E) = P(E\A leads after both flip 7) P(A leads after both flip 1) 


+ P(E| even after both flip 1) P(even after both flip 1) 
+ P(E|B leads after both flip n)P(B leads after both flip n) 


1 
= P(A leads) + 7 P (even) 


Now, by symmetry, 
P(A leads) = P(B leads) 
1 — P(even) 
. 2 
Hence, i 
P(E) = 5 


3.22. (a) Not true: In rolling 2 dice, let E = {sumis 7}, 
F = {1st die does not land on 4}, and G = {2nd die does not 
land on 3}. Then 


P{Znot (4,3)} _ 5/36 


P(E|F U G)= Pinot 4,3) ~ 35/36 = 5/35 # P(E) 
(b) P(E(F U G)) = P(EF U EG) 
= P(EF) + P(EG) since EFG = % 
= P(E)[P(F) + P(G)] 
= P(E)P(F U G) since FG = @ 


P(EFG) 
P(EF) 
= EEG since E is independent of FG 
P(EF) 
= PEEVE) by independence 
P(E)P(F) 


= P(G). 


(C) P(GIEF) = 


3.23. (a) necessarily false; if they were mutually exclusive, 
then we would have 


0 = P(AB) # P(A)P(B) 


(b) necessarily false; if they were independent, then we 
would have 


P(AB) = P(A)P(B) > 0 


(c) necessarily false; if they were mutually exclusive, then we 
would have 


P(A U B)= P(A) + P(B) = 1.2 
(d) possibly true 


3.24. The probabilities in parts (a), (b), and (c) are .5, (.8)° = 
512, and (.9)7 = .4783, respectively. 


3.25. Let Dj,i = 1,2, denote the event that radio i is defec- 
tive. Also, let A and B be the events that the radios were 
produced at factory A and at factory B, respectively. Then 


P(D1D2) 
P(D)) 
_ P(D1D2|A)P(A) + P(D1D2|B)P(B) 
~— P(D,|A)P(A) + P(D1|B)P(B) 
_ (.05)2(1/2) + (.01)?(1/2) 
~ (.05)(1/2) + (.01)(1/2) 
= 13/300 


P(D2|D1) = 


3.26. We are given that P(AB) = P(B) and must show that 
this implies that P(B°A‘) = P(A‘). One way is as follows: 


P(BSA®) = P((A U B)‘) 


=1- PAU B) 

=1— P(A) — P(B) + P(AB) 
=1-—- P(A) 

= P(A‘) 


3.27. The result is true for n = 0. With A; denoting the event 
that there are 7 red balls in the urn after stage n, assume that 


1 
n+ 


P(Aj) = 


Now let Bj,j =1,...,2 + 2, denote the event that there are 
j red balls in the urn after stage n + 1. Then 


n+1 
P(Bj) = >> P(BylAj)P(AD) 
i=1 
1 n+1 
= n+i1 X P(Bil|Aj) 
= 


1 
na 1 PO BilAj-v) + P(B;\Aj)] 


Because there are n + 2 balls in the urn after stage n, it 
follows that P(Bj|Aj_1) is the probability that a red ball is 
chosen when j — 1 of then + 2 balls in the urn are red and 
P(B;|Aj;) is the probability that a red ball is not chosen when 
jofthe n + 2 balls in the urn are red. Consequently, 


n+2-j 


j- 
P(Bi|A;_1) = 
oe aaa n+2 


1 
AnD? P(Bi\Aj) = 


Substituting these results into the equation for P(B;) gives 


P(B)) = 


1 j-1 n+2—-j|]_ 1 
n+1[n+2 n+2 ~n+2 
This completes the induction proof. 


3.28. If A; is the event that player i receives an ace, then 


By arbitrarily numbering the aces and noting that the player 
who does not receive ace number one will receive n of the 
remaining 2n — 1 cards, we see that 


n 
P(AjA2) = hn 1 
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Therefore, 


P(A,{A2) n—-1 
P(Ay) — 3n—1 


P(AS|A1) =1 — P(A2|Aq) =1 


We may regard the card division outcome as the result of two 
trials, where trial i,i = 1,2, is said to be a success if ace num- 
ber i goes to the first player. Because the locations of the two 
aces become independent as n goes to infinity, with each one 
being equally likely to be given to either player, it follows 
that the trials become independent, each being a success with 
probability 1/2. Hence, in the limiting case where n— oo, the 
problem becomes one of determining the conditional proba- 
bility that two heads result, given that at least one does, when 
two fair coins are flipped. Because pot converges to 1/3, the 
answer agrees with that of Example 2b. 


3.29. (a) For any permutation ij,...,i, of 1,2,...,n, the 
probability that the successive types collected is iy,...,in is 
Pi, ***Pi, = [11 pi. Consequently, the desired probability 
is n![ [i , Di- 

(b) For i),..., i, all distinct, 


=. VO 
PCE +E) = (7 y ) 


which follows because there are no coupons of types 
ij,...,iz when each of the n independent selections is one 
of the other n — k types. It now follows by the inclusion— 
exclusion identity that 


n n 
k4+1 n n— k 
PULED= 2" (2) ( ) 


Because 1 — P(U!_, Ej) is the probability that one of each 
type is obtained, by part (a) it is equal to a Substituting this 
into the preceding equation gives 


n! = kui [(n\ (n — kN” 
LE Bom (tes 


k=1 


or 
n 
ni=n" — ue (De — ky" 


or 


n 
k n 
n=) (-1) (Je =k" 

k=0 

3.30. P(E|E U F) = P(E|F(E U F))P(FIE U F) 
+ P(E|F°(E U F))P(FE U F) 
Using 
F(EU F)=F and FEU F)=FCE 
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gives 


P(E|E U F) = P(E|F)P(FIE U F)+P(E|EF*)P(F*|E U F) 


= P(E|F)P(FIE U F)+P(F‘|E U F) 
= P(E|F)P(F|E U F)+P(E|F)P(F*|E U F) 


= P(E|F) 
3.31. (a) 2/5 
(b) 5/6 
3.32. (a) 1/7 
(b) 1/6 
3.33. 
co, _ P(EFG*) 
P(E|FGS) = PEGS 
_ P(EF) — P(EFG) 
~ P(F) — P(FG) 


_ P(E)P) — P(E)P)PG) 
> P(F) — P(F)P(G) 
= P(E) 


The second equality in the preceding used that EF = EFG U 
EFG*. 

3.34. Let W, be the event that player 1 wins the contest. Let- 
ting O be the event that player 1 does not play in round 1, we 
obtain by conditioning on whether or not O occurs, that 


P(W 1) = P(Wi|O)P(O) + P(W1|O%P(O*) 
112 


{ 
= PW Oi 4 - 
(WilOS + 393 


where the preceding used that if O° occurs then 1 would have 
to beat both 2 and 3 to win the tournament. To compute 
P(W,|O), condition on which of 2 or 3 wins the first game. 
Letting B; be the event that i wins the first game 


P(W,|O) = P(W1 (0, Bz) P(B2|O) + P(W|0, B3)P(B3/0) 
_ 12 ‘i 13 

~ 35 45 

Hence, P(W1) = 3/20. Also, 


= 17/60 


P(W1|0)P(O) _ (7/60)A/3) 


POM = P(Wi) 3/20 


= 17/27 


3.35. . 
Pall white) 


P(all white|same) = Fea 


Now, 


(4) + 
7D P(same) = 


B 
P(all white) = My) W) 
(4) (4) 


giving that 


P(all white|same) = 


3.36. Let B3 be the probability that 3 beats 4. Because 1 
beats 2 with probability 1/3, 


P(1) = P(1|B3)P(B3) + PCBS)P(BS) = (1/3)(1/4)/7) 
+ (1/3)(1/5)(4/7) = 31/420 


3.37. (a) Condition on who wins the first game to obtain: 
P(W3) = P(W3|1 wins)(1/3) + P(W3|2 wins) (2/3) 
a. 7 3 
= 0/36/97 [5 + @/96/9 15 
i=4 i=4 
Ba 3 


2040+ 3 


(b) Condition on the opponent of player 4. If O; is the event 
that iis the opponent, i = 1,2,3, then 


11 1 

ee a4 19 

22 4 

OD 35> & 
1 4 13 
BO Sy ge 9G 


Hence, 


3 
P(Wa) = 9) POW4l0;)P(O})) = 
i=1 


41 44 413 194 
5121 615 | 720 ~ 315 


Chapter 4 


4.1. Since the probabilities sum to 1, we must have 
4P{X = 3}+ .5 = 1, implying that P{X = 0} = .375, P{X = 3} 
= .125. Hence, ELX] = 1(.3) + 2(.2) + 3(.125) = 1.075. 
4.2. The relationship implies that p; = c!pp,i = 1,2, where 
Pi = P{X = i}. Because these probabilities sum to 1, it fol- 
lows that 


1 
l+e4e%=-15 = 
Pot ) BS ee 
Hence, 
c + 2c2 
E[X] = + 2p) = ————~ 
[X] =p1 + 2p2 eae 


4.3. Let X be the number of flips. Then the probability mass 
function of X is 


pp =p? + (1 — p)*, p3=1—- pp =2p0 — p) 
Hence, 


E[X] = 2p2 + 3p3 = 2p2 + 301 — pr) =3 — p> — A — p* 


4.4. The probability that a randomly chosen family will have 
i children is nj/m. Thus, 


E[X] = )— inj/m 
1 


Also, since there are in; children in families having i children, 
it follows that the probability that a randomly chosen child is 
r 


from a family with i children is in;/ }~ in;. Therefore, 
i=1 


r 
> ?n; 
i=1 
E[Y] = 


T 
Pe inj 
i=] 


Thus, we must show that 


r r 
» 2n; > inj 
i=1 i=1 


ci 


r ~~ UF 
> inj > nj 
i=1 i=1 


or, equivalently, that 
r r @ i 
2 : : 
Sy Pm = Din iy 
j=l i=l i=l j=l 
or, equivalently, that 
Lr r r r 
2 e 
YBa = TDi 
i=1 j=1 i=1 j=1 


But, for a fixed pair i, j, the coefficient of njn; in the left-side 


summation of the preceding inequality is 7 + j*, whereas 
its coefficient in the right-hand summation is 2ij. Hence, it 
suffices to show that 


Pa P= 
which follows because (i — iy = 0. 
4.5. Let p = P{X = 1}. Then ELX] = p and Var(X) = 
pC — p), so 

p=3pd — p) 
implying that p = 2/3. Hence, P{X = 0} = 1/3. 
4.6. If you wager x on a bet that wins the amount wagered 
with probability p and loses that amount with probability 
1 — p, then your expected winnings are 


xp — x1 — p)= (2p — 1)x 


which is positive (and increasing in x) if and only if p > 1/2. 
Thus, if p = 1/2, one maximizes one’s expected return by 
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wagering 0, and if p > 1/2, one maximizes one’s expected 
return by wagering the maximal possible bet. Therefore, if 
the information is that the .6 coin was chosen, then you 
should bet 10; if the information is that the .3 coin was cho- 
sen, then you should bet 0. Hence, your expected payoff is 


1 1 
=(1.2 — 110+ ~<0-C=1-C 
5 ) +5 


Since your expected payoff is 0 without the information 
(because in this case the probability of winning is 5 (.6) + 
5(.3) < 1/2), it follows that if the information costs less than 
1, then it pays to purchase it. 

4.7. (a) If you turn over the red paper and observe the value 
x, then your expected return if you switch to the blue paper is 


2x(1/2) + x/2(1/2) = 5x/4 > x 


Thus, it would always be better to switch. 

(b) Suppose the philanthropist writes the amount x on the 

red paper. Then the amount on the blue paper is either 2x or 

x/2. Note that if x/2 = y, then the amount on the blue paper 

will be at least y and will thus be accepted. Hence, in this 

case, the reward is equally likely to be either 2x or x/2, so 
E[Ry(x)] = 5x/4,  ifx/2 = y 

If x/2 < y = 2x, then the blue paper will be accepted if its 

value is 2x and rejected if it is x/2. Therefore, 

E[Ry(x)] = 2x(1/2) + x(1/2) = 3x/2, ifx/2 < y = 2x 
Finally, if 2x < y, then the blue paper will be rejected. Hence, 
in this case, the reward is x, so 

Ry(x)=x, if2x < y 
That is, we have shown that when the amount x is written on 
the red paper, the expected return under the y-policy is 


x ifx < y/2 
E[Ry(x)] = 4 3x/2 ify/2 =x < 2y 
5x/4 ifx = 2y 


4.8. Suppose that n independent trials, each of which results 
in a success with probability p, are performed. Then the num- 
ber of successes will be less than or equal to 7 if and only if 
the number of failures is greater than or equal ton — i. But 
since each trial is a failure with probability 1 — p, it follows 
that the number of failures is a binomial random variable 
with parameters n and 1 — p. Hence, 


P{Bin(n, p) = i} = P{Bin (n,1 — p) =n —- 
=1—- P{Bin(7,1 — p)=n-—i- 
The final equality follows from the fact that the probability 


that the number of failures is greater than or equal ton — i 
is 1 minus the probability that it is less thann — i. 
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4.9. Since E[X] = np, Var(X) = np(1 — p), we are given that 
np = 6,np(1 — p) = 2.4. Thus, 1 — p= .4, orp = .6,n = 10. 
Hence, 


Axes) ) (.6)°(.4)° 


4.10. Let X;,i = 1,...,m, denote the number on the ith ball 
drawn. Then 


PUX < ky) = P(X, = k,Xp <k,....Xm <= Kh} 
= P(X, = P(X, <= k}---P{Xm = 


Therefore, 


PIX =k =P(X Sk} — P{X sk u=(¢) (=) 


4.11. (a) Given that A wins the first game, it will win the 
series if, from then on, it wins 2 games before team B wins 
3 games. Thus, 


4 
P{A wins|A wins first} = > ( rap 
i=2 
(b) 


. . P{A wins|A wins first}P{A wins first} 
P{A wins first|A wins} = 


P{A wins} 
4 
4\ ; - 
(‘ora _ p)* i 
i=2 
5: 


y (7) pil — py 


i=3 


4.12. To obtain the solution, condition on whether the team 
wins this weekend: 


4) ic oti A (4) aie adi 
ay, ;) (4)'C6) +5)> 7) CD03) 
i=3 


i=3 


4.13. Let C be the event that the jury makes the correct deci- 
sion, and let F be the event that four of the judges agreed. 


Then 
7 


a F ; 
POS). (Jeni 


i=4 
Also, 


P(CF) 
P(F) 


(2) -n403) 
~ (2) (7)4(.3)3 + (3) (.7)3(.3)4 
9 


P(C\|F) = 


4.14. Assuming that the number of hurricanes can be 
approximated by a Poisson random variable, we obtain the 
solution 


3 
ye ay 
i=0 
4.15. E[Y] = )_iP{X =i)/P{X > 0} 
i=1 
= E[X]/P{X > 0} 
ee: 
~ 4 -— eA 


4.16. (a) 1/n 
(b) Let D be the event that girl i and girl j choose different 
boys. Then 


P(G;Gj) = P(G;G;|D)P(D) + P(GjGj|D) P(D‘) 


= (1/n)?(1 — 1/n) 
n—-1 


n3 
Therefore, 
n—-1 
n2 


P(G;\|Gj) = 


(c), (d) Because, when n is large, P(G;|G;) is small and 
nearly equal to P(Gj), it follows from the Poisson paradigm 
that the number of couples is approximately Poisson dis- 
tributed with mean ey P(Gj) = 1. Hence, Pp ~ e—! and 
Py = el /k! 

(e) To determine the probability that a given set of k girls all 
are coupled, condition on whether or not D occurs, where D 
is the event that they all choose different boys. This gives 


P(Gj, +++ Gi) = P(Gi, +++ Gi,|D)P(D) 
+ P(Gi,--- Gi,|DP(D*) 
n(n —1)---n-—k+1 
= jm MO D8 
n 
n! 
(n — k)!n2k 
Therefore, 
n 
YE PGi, Gig) = |, \PGin + Gig) 
1 <...<Ix 
n'n! 


~ (n — K)NM(n — k)kIn2k 


and the inclusion-exclusion identity yields 


nin! 
(n — kin — k)!kn2k 


n 
1 — Py = PUL, G) = oct 
k=1 


4.17. (a) Because woman j is equally likely to be paired with 

any of the remaining 2n — 1 people, P(W;) = mo 

(b) Because, conditional on W;, woman / is equally likely to 

be paired with any of 2n — 3 people, P(W;|W;) = gig 

(c) When n is large, the number of wives paired with 

their husbands will approximately be Poisson with mean 
4 P(W;) = so ~ 1/2. Therefore, the probability that 

there is no such pairing is approximately e~!/2. 

(d) It reduces to the match problem. 


4.18. (a) (;) (9/19) (10/19)9(9/19) = (;) (9/19)4(10/19)° 
(b) If W is her final winnings and X is the number of bets she 
makes, then, since she would have won 4 bets and lost Y — 4 
bets, it follows that 

W =20 — 5(X¥ — 4)=40 — 5X 


Hence, 


E[W] = 40 — SE[X] = 40 — 5[4/(9/19)] = -20/9 


4.19. The probability that a round does not result in an “odd 
person” is equal to 1/4, the probability that all three coins 
land on the same side. 

(a) (1/4)°(3/4) = 3/64 

(b) (1/4)4 = 1/256 


4.20. Let g =1 — p. Then 


OO 4 
E{1/X] =) =a" 'p 


4.21. Since a= will equal 1 with probability p or 0 with 
probability 1 — p, it follows that it is a Bernoulli random 


variable with parameter p. Because the variance of such a 


Solutions to Self-Test Problems and Exercises 493 


Bernoulli random variable is p(1 — p), we have 


X—bD 1 
p(t = p)=Var (2 =?) = — vane b) 
= —— Var(X 
@— be ar(X) 


Hence, 


Var(X) = (a — b)*p(1 — p) 


4.22. Let X denote the number of games that you play and 
Y the number of games that you lose. 

(a) After your fourth game, you will continue to play until 
you lose. Therefore, X¥ — 4 is a geometric random variable 
with parameter 1 — p,so 


E[X] = [4 + (X — 4) =4 + ELX — 4)=44 — 
—?P 


(b) If we let Z denote the number of losses you have in the 
first 4 games, then Z is a binomial random variable with 
parameters 4 and 1 — p. Because Y = Z + 1, we have 


E[Y] = E[Z + 1]=2[Z] +1=40 — p) +1 


4.23. A total of n white balls will be withdrawn before a total 
of m black balls if and only if there are at least n white balls 
in the first n + m — 1 withdrawals. (Compare with the prob- 
lem of the points, Example 4j of Chapter 3.) With XY equal to 
the number of white balls among the first n + m — 1 balls 
withdrawn, X is a hypergeometric random variable, and it 
follows that 


n+m—-1 
PIX=n}= D> PiX=i 


i=n N+M 
n+m—t1 


4.24. Because each ball independently goes into urn i with 
the same probability p;, it follows that Xj; is a binomial ran- 
dom variable with parameters n = 10, p = pj. 

First note that X; + Xj is the number of balls that go 
into either urn 7 or urn j. Then, because each of the 10 balls 
independently goes into one of these urns with probability 
pi + pj, it follows that X; + Xj is a binomial random variable 
with parameters 10 and pj + pj;- 

By the same logic, X; + X27 + X3 is a binomial random 
variable with parameters 10 and p; + p2 + p3. Therefore, 


10 
P(X, + X) + X3= 7} = (FJ + po + p3)/(pa + ps)? 
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4.25. Let Xj; equal 1 if person i has a match, and let it equal 


0 otherwise. Then 
n 
X=) Xj 
i=1 


is the number of matches. Taking expectations gives 


EA)=E| A=) Fala) Papas) eat 
i=1 i=1 i=1 i=1 


where the final equality follows because person i is equally 
likely to end up with any of the x hats. 

To compute Var(X), we use Equation (9.1), which states 
that 


E[X?] =~ ELXi] + 92 > ELX;X]] 
i=1 


i=1 j#i 


Now, fori # j, 


E[X;Xj] = P(X; = 1, Xj = 1) = P(X; = P(X) = 1X; = 1) 


iii 
~nrnn—1 
Hence, 
= 1 
E[X?] =1 — 
eel . yay S n(n — 1) 
i=1 jAi 
1 + n¢( 1) : 2 
= nA — ———“—- = 
n(n — 1) 
which yields 


Var(X) =2 — 17 =1 


4.26. With g = 1 — p, we have, on the one hand, 


PH) => Pix = 29 


- Pq re 
ad-qgd+q tI1+4 


On the other hand, 
P(E) = P(E|X = 1p + PIE|X > Iq =qP(£|X > 1) 


However, given that the first trial is not a success, the num- 
ber of trials needed for a success is 1 plus the geometrically 


distributed number of additional trials required. Therefore, 
P(E|X > 1) = P(X + 1is even) = P(E‘) =1 — P(E) 
which yields P(E) = q/(1 + q). 


4.27. In order for N = 6 one of the teams must be up 3 games 
to 2 after the first 5 games and then must win game 6. This 
gives 


5 5 
P(N=6)= (3)era — pp + (3) (1. — p)?p?(1 — p) 
= 10—p4(1 — p)? + 1 — p)4p?) 


On the other hand, N = 7 if each team wins 3 of the first 6 
games. Hence, 


P(N=7)= (5 


3 )*d = py =20p8 — p)? 


Hence, 
P(N = 6) — P(N=7) =p" — py” 


(10p? + 1001 — p)? — 20pa — p)) 
= p"(1 — p)(40p? — 40p + 10) 


Calculus shows that 40p? — 40p + 10 is minimized when 
p = 1/2 with the minimizing value equal to 0. 

(b) In order for N = 6 one of the teams must be up 3 games 
to 2 after the first 5 games, and because when p = 1/2 each 
team is equally likely to win game 6, it is just as likely that N 
will equal 6 as that it will equal 7 

(c) Imagine that the teams continue to play even after one 
of them has won the series. The team that wins the first game 
must win at least 3 of the next 6 games played to win the 
series. Hence, the desired answer is 2 Ca /2)® = 42/64. 


4.28. (a) The negative binomial represents the number of 
balls withdrawn in a similar experiment but with the excep- 
tion that the withdrawn ball would be replaced before the 
next drawing. 

(b) Using the hint, we note that X = r if the first r — 1 balls 
withdrawn contain exactly k — 1 white balls and the next 
withdrawn ball is white. Hence, 


k-1 r—k _ 
P(X =n = n k+1 


Gag n+m-rt+l 


r—1 
kKsrsme+k 


4.29. (a) +(3) (<1/3)52/3)5 + (1/2)8 + (3/4)5(1/4)3) 
(by 4[(2/3)4(1/3) + 1/25 + 1/4)4B/4)] 


4.30. Binomial with parameters n and 1 — p. 


4.31. 
(Cee) 
Ca 


PX =k)= 


4.32. X = iif the first i — 1 balls consist of r — 1 red and 
i — r blue balls, and the next ball is red. Hence, 


()(2) 
(1) 


n-r+l1 


P(X = : 
( a= n+m—-i+l1 


Let Y be the number of balls that have to be removed 
until a total of s blue balls have been removed. Then, V = 
min(X,Y) and fori < r+=s, 


PV =i) = P(X =i) + P(\Y=) 


_ (dG) a-rt) 

> (it) n+m-i+l 

di (aa) m-—st1 
(a) n+m—-it+1 
i=1 


Now, Z = max(X, Y). Because Z=r+s,andZ=i=r+s 
either if X =ior if Y =i, we have, fori = r + s, that 


PZ=i) =P(X =) + PY =i) 


_(2)) n-r+1 
er) n+m—-i+l1 
(571) (i"5) m—st+1 


7 ("i") ntm—it+l 


X < Y ifthe r” red ball is removed before a total of r + s 
balls have been removed. That is, 


P(X < Y) 


St iG») n—r+1 
~ & CT n+m—-it+l 


=P(X <r4+s) 


Chapter 5 

5.1. Let X be the number of minutes played. 

(a) P(X > 15}=1 — P{X S 15} =1 — 5(.025) = .875 
(b) P{20 < X < 35} = 10(.05) + 5(.025) = .625 

(c) P{X < 30} = 10(.025) + 10(.05) = 

(d) P{X > 36} = 4(.025) = 

5.2. (a) 1 = fy cx"dx =c/(n +)=sc=n41 


1 
(b) IX > et Df xtdx ax] a1 — yl 
x 


5.3. First, let us find c by using 


2 
l= [ cx4dx = 32c/5 > c = 5/32 
0 
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(a) EX] = 3 fox 


(b) E[X?] = 3 fp xfde = 348 = 20/7 = Var(X) = 


20/7 — (5/3) = 5/63 
5.4. Since 
1 
1 -| (ax + bx*)dx = a/2 + b/3 
0 
1 
6 =| (ax? + bx3)dx = a/3 + b/4 
0 


we obtain a = 3.6,b = —2.4. Hence, 


(a) P(X < 1/2) = fy? Box — 24x%)dx = (18x? - 
1/2 

x3) = 35 

(b) ELX?] = fi G.6x3 — 2.4x)dx = 42 = Var(X) = 


5.5. Fori=1,...,n, 


P{LX = i} = P{Int(nU) =i — 1} 
=Pti-1snU <i} 


-P{—"su<-| 
n n 


=1/n 


5.6. If you bid x, 70 =x = 140, then you will either win the bid 
and make a profit of x — 100 with probability (140 — x)/70 
or lose the bid and make a profit of 0 otherwise. Therefore, 
your expected profit if you bid x is 


1 1 
0% — 100)(140 — x)= 70 (240x — x? — 14000) 


Differentiating and setting the preceding equal to 0 gives 
240 — 2x=0 


Therefore, you should bid $120,000. Your expected profit 
will be 40/7 thousand dollars. 


5.7. (a) P{U > .1} = 9/10 

(b) P{U > .2|U > 1}=P{U > .2}/P{U > 1} =8/9 

(ce) P{U > .3|U > .2,U > 1} = PLU > .3}/P{U > 2} =7/8 
(d) P{U > .3} = 7/10 

The answer to part (d) could also have been obtained by 
multiplying the probabilities in parts (a), (b), and (c). 


5.8. Let X be the test score, and let Z = (X — 100)/15. Note 
that Z is a standard normal random variable. 
(a) P{X > 125} = P{Z > 25/15} = .0478 


(b) P(90 < X < 110} = P{-10/15 < Z < 10/15} 
= P{Z < 2/3} — P{Z < —2/3} 
= P{Z < 2/3} — [1 — P{Z < 2/3}] 
= .4950 


496 Solutions to Self-Test Problems and Exercises 
5.9. Let X be the travel time. We want to find x such that 
P{X > x} =.05 


which is equivalent to 


pja—* “SA | = 0s 


7 <= 
That is, we need to find x such that 


x — 40 
= .05 
rl 


P {z > 
where Z is a standard normal random variable. But 
P{Z > 1.645} = .05 
Thus, 


ea iS or RSH SIS 


Therefore, you should leave no later than 8.485 minutes after 
12 PM. 


5.10. Let X be the tire life in units of one thousand, and let 
Z = (X — 34)/4. Note that Z is a standard normal random 
variable. 

(a) P(X > 40} = P{Z > 1.5} © .0668 

(b) P{380 < X < 35} = P{-1 < Z < .25}=P{Z < .25} — 
P{Z > 1} = .44 


(c) PLX > 40,.X > 30} = PLX > 40}/P{[X > 30} 
= P{Z > 1.5}/P{Z > —1} © .079 


5.11. Let X be next year’s rainfall and let Z= (X — 40.2)/8.4. 
(a) P{X > 444 = P{Z > 3.8/8.4} = P{Z > .4524} = .3255 


(b) ( : (.3255)9(.6745)4 


5.12. Let M; and W; denote, respectively, the numbers of 
men and women in the samples that earn, in units of $1,000, 
at least i per year. Also, let Z be a standard normal random 
variable. 


(a) 

P{Wos = 70} 
= P{W5 = 69.5} 
_ | Was — 200(.34) _ 69.5 — 200(.34) 
~ /200(.34) (.66) /200(.34)(.66) 
= P{Z = 2239} 
~ 4114 

(b) 


P{Mo5 = 120} 
= P{Mo5 = 120.5} 
_ 8 — (200)(.587) _ 120.5 — amar | 


J/(200)(.587)(413) —- (200)(.587)(.413) 
P{Z < .4452} 
= 6719 


2 


(c) 
P{Mo9 = 150} 


= P{My) = 149.5} 
_ abe = (200)(.745) _ 
/(200)(.745)(.255) 
P{Z = .0811} 
4677 
P{W) = 100} 
= P{W = 99.5} 
_ ape — (200)(.534) _ 99.5 — sama | 
/(200)(.534)(.466) —- (200)(.534)(.466) 
P{Z = —1.0348} 
8496 


149.5 — (200)(.745) 
s/(200)(.745)(.255) 


2 


2 


v 


2 


Hence, 
P{M 9 = 150}P{W9 = 100} ~ .3974 


5.13. The lack of memory property of the exponential gives 
the result e~4/>. 

5.14. (a) e72 = e~4 

(b) F(3) — F(1) =e7! — e79 

(c) A(t) = 2te-" fe“? = 24 

(d) Let Z be a standard normal random variable. Use the 
identity ELX] = /o° P(X > x} dx to obtain 


ae?) 
EX] = | e* dx 
0 
Oo" _ 42 
S| e Y/2 dy 
0 


=271/2,/27P{Z > 0} 


= Ja/2 
(e) Use the result of Theoretical Exercise 5.5 to obtain 
2 oo 2 2|° 
E[X*] = [ 2xe* dx=-e*| =1 
0 0 


Hence, Var(X) = 1 — 7/4. 
5.15. (a) P(X > 6} = exp{— fe ra@adt} = e345 


(b) P{X < 8|X > 6}=1 — P{X > 8X > 6} 
=1— P{X > 8}/P{X > 6} 
md eg Sg 9A 
= 8892 

5.16. For x = 0, 


Fy x(x) = P{1/X < x} 
= P(X < 0} + PIX =1/x} 
=1/2 +1-— Fy(1/x) 


Differentiation yields 


fiyx@) =x? fx(A/x) 
1 
~ ¥2x(1 + (1/x)2) 
= fx@) 
The proof when x < 0 is similar. 


5.17. If X denotes the number of the first n bets that you win, 
then the amount that you will be winning after 1 bets is 


35X — (n — X)=36X —n 
Thus, we want to determine 
a= P{36X — n > 0} = P{X > n/36} 


when_X is a binomial random variable with parameters n and 
p=1/38. 
(a) When n = 34, 
a=P{X = I} 
= P{X > .5} (the continuity correction) 
_ X — 34/38 5 — 34/38 
7 /34(1/38) (37/38) /34(1/38) (37/38) 


X = 34/38 

=P = 
/34(1/38) (37/38) 
(.4229) 

6638 


4229 


2 


2 


(Because you will be ahead after 34 bets if you win at least 
1 bet, the exact probability in this case is 1 — (37/38)>4 = 
5961.) 


(b) When n = 1000, 


a= P{X > 27.5} 


X — 1000/38 27.5 — 1000/38 
~" | 10000 /38)7/38) 10000 /38)G7/38) 
= 1 — (.2339) 

= 4075 
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The exact probability—namely, the probability that a bino- 
mial n = 1000, p = 1/38 random variable is greater than 
27—is .3961. 


(c) When n = 100,000, 
a= P{X > 2777.5} 
X — 100000/38 2777.5 — 100000/38 
~~ | 1000001 /38)G7/38) 1000001 /38)37/38) 
1 — (2.883) 
0020 


v 


2 


The exact probability in this case is .0021. 


5.18. If X denotes the lifetime of the battery, then the 
desired probability, P(X > s + t|X > t}, can be determined 
as follows: 


PIX >s+t,X > th 
P{X > t} 
PX >s+h 
~~ PIX > 
P{X>s+i|battery is type 1}p; 
+P{X>s+t\battery is type 2}p2 
P{X>1\battery is type 1}p; 
+P{X>t\battery is type 2}p> 


PIX >s4+tX > H= 


e 1+) py ie e A284) ny 


eMipy + e2!po 


Another approach is to directly condition on the type of 
battery and then use the lack-of-memory property of expo- 
nential random variables. That is, we could do the following: 


P{X >s4+t|X > t} 
=P{X >s+4+1tX > t,type 1}P{type1|X > 4 
+ P{X > s+ t|X > ttype 2}P{type 2|X > t} 
=e P{type 1|X > t} + e7*2°P{type 2|X > 1 


Now for i = 1,2, use 


P{typei,X > t} 
P{X > tf 
P{X > t\type ip; 
~ P{X > iltype 1jpi1 + P{X > t\type 2}po 
ety; 
ep, + e*atpy 


P{type |X > f= 


5.19. Let X; be an exponential random variable with mean 
i= 1,2: 

(a) The value c should be such that P(X, > c} = .05. There- 
fore, 


e~* = 05 = 1/20 


or c = log(20) = 2.996. 
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(b) P(X, > ch=e 9/2 = 4 
5.20. (a) 


E((Z — c)*]= =I. (x — ote /2 dx 


= rs ye /2 
= —: x = ce Xx 
2m Jc 


1 Io =¥*/24 1 J <9 /2 4 
= — xe" ——— ce Ix 
V20 Je V20 Jc 


= .2236 


3) 


_ ot I — c(1 — (©) 
_ er cl — &(c)) 
IT 


(b) Using the fact that X has the same distribution as ~ + 
oZ, where Z is a standard normal random variable, yields 


EU(X — ot] = El + oZ - 0)*] 


where a= —*. 


5.21. Only (b) is true. 
5.22. (a) Ifb > 0, then for0 < x < b, 


P(bU < x) =P{U < x/b} = x/b. 


Hence, 
fou@®) =1/b,0 <x <b 


The argument when b < Ois similar. 
(b) Fora < x < 1+a, 


P{a+U<x}=P{U <x-a}=x-a 
Differentiation yields 
fa4su®) =la<x<lt+a 


(c)a + (6 — aU 
(d) For 0 < x < 1/2, 


P{min(U,1-U) <x} =P({U< x} U {U >1--4}) 
=P{U < x}+ P{U >1 — x}=2x 


Differentiating gives 


fmin(U, 1-u)®) =2, 0 <x < 1/2 


(e) Using that max(U,1 — U) = 1 — min(U,1 — U), the 


result follows from (a), (b), and (d). A direct argument is 
that, for 1/2 < x < 1, 


P{max(U,1-U) <x} =1-P{max(U,1-U) > x} 
=1-P({U> a) U [0 <1 =) 
i Me 


x) =2x-1 
Hence, 


fnaxtua-ury®) =2, 1/2<x<1 


5.23. (a) f°,edx +14 free Vay =14141=3. 
(b) E[X] = 1/2 


5.24. (a) 25 for + x)Oe dx = 251 + A) =1. 

(b) With Y being exponential with rate 6, E[X] = 
ro (ELY] + ELY’) = Ath. 

(c) E[X?] = -5(ElY?] + ELY3) = -4(4% + S). Hence, 


1+6 ‘62 63 

6 2 6 24+ » 
Var(X) = 

an) Teo gt gs o(1 + @) 


Chapter 6 
6.1. (a) 3C + 6C=15C=1/9 
(b) Let p(i,) = P{X =i, Y =j}. Then 


p(i,1) = 4/9, p10) = 2/9, PO, 1) = 1/9, pO, 0) = 2/9 


>. re 
so 1/929) 


(12)! 


SESE fi 312 
@ Gps l/9) 


12 12 . Dj 
@X(; )eraa 


(c) 


6.2. (a) With pj = P{XYZ = j}, we have 


Po =P2 =P4= P12 = 1/4 
Hence, 


E[XYZ] =(6 + 2+ 4 + 12)/4=6 


(b) With qj = P(XY + XZ + YZ=j}, we have 


M1 = =48 = 116 = 1/4 
Hence, 


E[XY + XZ + YZ]=(11+ 5+ 8 + 16)/4=10 


6.3. In this solution, we will make use of the identity 


(oe) 
[ e *x" dx =n! 
0 


which follows because e~*x"/n!,x > 0, is the density func- 
tion of a gamma random variable with parameters m + 1 and 
A and must thus integrate to 1. 


oo y 
(a) 1= cf | (y — x)dxdy 
0 -y 
[oe] 
= ey e2y* dy =4C 
0 
Hence, C = 1/4. 


(b) Since the joint density is nonzero only when y > x and 
y > —x, we have, forx > 0, 


‘Li [o-<) 
fxn = 5 fo - EP dy 
x 
1 [o-<) 
= if ue FF dy 
4 Jo 
Forx < 0, 
1 [o-e) 
fx) = Z (y — xe dy 
—XxX 


1 _ = = 
= give? — + xe PY, 


= (—2xe* + e*)/4 


(©) fy) = 4e Py — x) dx = 5y*e 


ee) 0 
i xe *dx + / (—2x2e* + xe) a] 
0 —0oo 


(@ ELx|= 4 

= 1 - [ore + ye) ay] 
4 0 
1 


(e) E[Y] = 4 [g° ye dy = 3 


6.4. The multinomial random variables Xj,i = 1,...,7r, rep- 
resent the numbers of each of the types of outcomes 1,...,7 
that occur in m independent trials when each trial results 
in one of the outcomes 1,...,7 with respective probabili- 
ties pj,...,pr. Now, say that a trial results in a category 1 
outcome if that trial resulted in any of the outcome types 
1,...,r,3 say that a trial results in a category 2 outcome if that 
trial resulted in any of the outcome types r; + 1,...,71 + 123 
and so on. With these definitions, Y;,..., Y, represent the 
numbers of category 1 outcomes, category 2 outcomes, up 
to category k outcomes when n independent trials that each 
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result in one of the categories 1,...,k with respective prob- 


abilities 4 pj, = 1,...,k, are performed. But by 


definition, such a vector has a multinomial distribution. 
6.5. (a) Letting pj = P{XYZ = j}, we have 


Pp, =1/8, pr =3/8, p4=3/8, pg=1/8 


(b) Letting pj = P{XY + XZ + YZ = j}, we have 
p3 = 1/8, ps =3/8, pg =3/8, Piz =1/8 
(c) Letting pj = PX? + YZ = j}, we have 


p2 =1/8, p3=1/4, ps=1/4, poe =1/4, pg=1/8 


1 ps5 
6.6. (a) 1 =| / (x/5 + cy) dy dx 


1 
— [ (4x/5 + 12c) dx 
0 
= 12c + 2/5 
Hence, c = 1/20. 
(b) No, the density does not factor. 


1 5 
jrix+y>3i= ff (x/5 + y/20) dy dx 


1 

= i [(2 + x)x/5 + 25/40—(3—x)*/40] dx 
0 

= 1/5 + 1/15 + 5/8 — 19/120 = 11/15 


6.7. (a) Yes, the joint density function factors. 
(b) fx (x) = x fg ydy = 2x, 
() fy) =y fo xdx=y/2, O<y <2 


(d) 
P{LX < x,Y < y}=P{X < x}P{Y < y} 


O<x<l 


= min(1,x) min(1, y”/4), x >0,y >0 
(e) E[Y] = fo y?/2dy = 4/3 


1 1-x 
orx+y<y=f «| ydy dx 
0 0 


1 
= >| x(1 — x)? dx = 1/24 
2 Jo 


6.8. Let 7; denote the time at which a shock type i, of i = 
1,2,3, occurs. Fors > 0,t > 0, 
P(X, >s,X7 >t} = P{T, > s,Tz > t,T3 > max(s,b} 
= P{T, >s}P{T> > t}P{T3 > max(s, f)} 
= exp{—A,s} exp{—Ag/} exp{—A3 max(s, f)} 
= exp{—(Ays + Agt + Az max(s,t))} 
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6.9. (a) No, advertisements on pages having many ads are 
less likely to be chosen than are ones on pages with few ads. 


0) in 


m 
ym : 


(c) = =fi/n, where i = Y“n()/m 
i=1 
(a) 1 = njmykA 28 T= — ayy" yam 
mn ni) 
(e) . ee 
ka rim 


(f) The number of iterations is geometric with mean n./n 


6.10. (a) P(X = i} = 1/m, b= 1a 3M 

(b) Step 2. Generate a uniform (0, 1) random variable U. 
IfU < n(X)/n, go to step 3. Otherwise return to step 1. 
Step 3. Generate a uniform (0, 1) random variable U, and 
select the element on page X in position [n(X)U] + 1. 


6.11. Yes, they are independent. This can be easily seen by 
considering the equivalent question of whether Xj is inde- 
pendent of N. But this is indeed so, since knowing when the 
first random variable greater than c occurs does not affect 
the probability distribution of its value, which is the uniform 
distribution on (c, 1). 


6.12. Let p; denote the probability of obtaining i points on a 
single throw of the dart. Then 


p30 = 7/36 

P29 = 42/36 — p39 = 1/12 

P19 = 92/36 — p29 — p30 = 52/36 

Po =1-— Pio — P20 — P30 =1 — 1/4 


(a) 2/12 
(b) 1/9 

(c) 1 — 2/4 

(d) 7 (30/36 + 20/12 + 50/36) = 3527/9 
(e) (/4)* 

(f) 2(t/36)(1 — 2/4) + 2(/12)(52/36) 


6.13. Let Z be a standard normal random variable. 
s xX, =6 


(a) 
Get =R|— 
az ; J24 | 
=~ P{Z > —1.2247} © 8897 


= P{X3 + Xq4 > 5} 


4 2 
4 0| yx) = -5 
i=1 i=1 


Ry 4 2y-3 
ap) Ss 5/8 
/12 


=~ P{Z > 5774} © .2818 


(c) 


4 


So Xj>01X1 = 
i=1 


= P{X. + X3 4+ X4 > —5} 


Xn+X34+X4—-—4.5 
J18 
= P{Z > —2.239} = .9874 


=P 9.5/V18 


6.14. In the following, C does not depend on n. 


P(N = n|X = x} = fx\nQln) P(N = n}/fx (@) 


= 1 n-1 = n—-1 
= Coy" - pd 


= C(a(l — p)x)"!/~™ — D! 


which shows that, conditional on Y¥ = x,N — 1 isa Poisson 
random variable with mean A(1 — p)x. That is, 


P{N =n|X =x} = P{N-1=n-1|X = x} 


=e AC-P)*A.(1—p)x)"1 (m1)! n = 1. 


6.15. (a) The Jacobian of the transformation is 


As the equations u = x,v = x + y imply that x = u,y = 
v — u, we obtain 


fuvU.v)=fxyu,v—u)=1, O<u<1, O0<v—-u<l 
or, equivalently, 
fu,vU,v) =1, max(v — 1,0) < u < min(v,1) 


(b) For0 < v < 1, 
Vv 
fv) = | du =v 
0 


Forl =v s2, 


1 
fv) = | du=2—-—v 
v-1 


6.16. Let U be a uniform random variable on (7, 11). If you 
bid x, 7 = x = 10, you will be the high bidder with probability 


3 3 
3 U—7 x—7 _(x-7 
(P{U < x}) - (| 7 < Z } = ] ) 


Hence, your expected gain—call it E[G(x)]—if you bid x is 


1 
E[G@)] = Be — 7°d0 — x) 


Calculus shows this is maximized when x = 37/4. 


6.17. Let ij, i2,...,in, be a permutation of 1,2,...,. Then 
P{X1=11,X2 =iy,.. : Xn =in} 

= P(X = iy}P{X2 = ig} --- P{Xn = in} 

= Pi,Pir cad ‘Pin 

= P1P2°**Pn 


Therefore, the desired probability is n! pyp2---pn, which 
reduces to a when all p; = 1/n. 


n n 
6.18. (a) Because )> X; = >> Yj, it follows that N = 2M. 
i=1 i=1 

(b) Consider the n — k coordinates whose Y-values are equal 
to 0, and call them the red coordinates. Because the k coordi- 
nates whose X-values are equal to 1 are equally likely to be 
n 
k 
ber of red coordinates among these k coordinates has the 
same distribution as the number of red balls chosen when 
one randomly chooses k of a set of n balls of which n — k 
are red. Therefore, M is a hypergeometric random variable. 


(c) E[N] = E[2M] = 2E[M] = 20 
(d) Using the formula for the variance of a hypergeometric 
given in Example 8j of Chapter 4, we obtain 


any of the sets of k coordinates, it follows that the num- 


Var(N) = 4 Var(M) = an fea — k/n)(k/n) 
= 


n 

> Z; is a normal 
i=k+1 
random variable with mean 0 and variance n — k that is 
independent of S;,. Consequently, given that S, = y, Sy is 
a normal random variable with mean y and variance n — k. 


6.19. (a) First note that S, — S, = 


(b) Because the conditional density function of S;, given that 
Sn = x is a density function whose argument is y, anything 
that does not depend on y can be regarded as a constant. (For 
instance, x is regarded as a fixed constant.) In the following, 
the quantities Cj,i = 1,2,3,4 are all constants that do not 
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depend on y: 


F5;,8n V2) 


Fsy\8, V1) = fs, (0) 


1 
= Cifs,,18, lf, ) (wer C= i 5) 


= ¢, — 1 rn) _ 1 _ oy? 2k 
J2n/n—k J2nJ/k 

: @-yr 

= cre | 2n—k 2k 

7 2xy y? y 

aes oa —kh 2n-kh 2k 


=. bs 2 5K 
ee 2k(n — k) 4 n> 


-_ n RN fie 

Sane 2k(n — k) (» “*) (5s) 
n RY 

2k(n — k) (» =) 


But we recognize the preceding as the density function 


=C4 ex 


of a normal random variable with mean —x and variance 
n 
k(n — k) 


n 
6.20. (a) 
P(X6 > X4|X1 = max(Xj,.. 7 ,X5)} 
P{X6 > X1, X, = max(Xq,...,X5)} 
~ P(X, = max(Xj,...,X5)} 


= P{X6 = max(Xq,...,X6), X, = max(Xj,...,X5)} 
~ 1/5 


Thus, the probability that X¢ is the largest value is inde- 
pendent of which is the largest of the other five values. (Of 
course, this would not be true if the X; had different distri- 
butions.) 


(b) One way to solve this problem is to condition on whether 
X6 > X41 : Now, 


P{X6 > Xo|X1 = max(X1,...,X5),X%% > Xy}=1 
Also, by symmetry, 

P{X6 > Xo|X1 = max(Xq,...,X5),X6 < Xj} = : 
From part (a), 


1 
P(X6 > X4|X4 = max(Xq,...,X5)} = 6 
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Thus, conditioning on whether X¥¢ > Xj yields the result 


P{X6 > X2|X1 = max(X],...,X5)} = = + 


6.21. P{X >s,Y > t} 

=1- P({x ss} u {y <4) 

=1-—P{X ss}-—P{Y sc +P{X ss,Y st} 
6.22. Suppose j < i, and consider P(X; = i, Ys = j). If there 
have been s failures after trial j then there have been j — s 
successes by that point. Hence, the conditional distribution 
of X;, given that Ys = j, is the distribution of j plus the num- 


ber of additional trials after trial j until there have been an 
additional r — j + s successes. Hence, forj < i 


P(X; = 1, Ys =f) = P(Ys = /)P(Xr = ilYs =)) 
= P(Ys = p)P(Xs47_j =i — jf) 


2 | 
= ( Ja — pps 
s—1 


i-j-l s+r—j i-s-r + 
t= 
(cog cae (1 — p) » j<i 


6.23. Forx > x9, P(X > xiX > x9) = PGS = ox = 
“0 


ae 
6.24. 
/ fevoyvory= [ I sonay 
—oo —0o fy) 
= / fa, y)dy 
= fx (x) 


6.25. (a) pk (1 — Tl - pi) 
(b) Condition on the number of times i would advance if i 
played forever, to obtain aan pk — pi) That = pes 


(c) > yr a — pid T]pi - Pk). 


6.26. (a) even; 

(b) 1; 

(©) [TL 2a; — 0; 
(d) 


n n 
[ [@u - ) = 4I[] ¥i 
i=1 i=1 
n n 
=FA[ [Yi =1- Pq] ve) 
i=1 i=1 


n 
= 2P({ [¥i=1) - 1 
i=1 


giving that 


n 1 1 Qa; —1 
P(S is even) = PT] Y=)= + Wier Co ) 


, 2. 
i=1 
6.27. ForO < x < 1 
fewlain = DN ENIX = fx) 
IN P(N =n) 
ed _ xyMya-ley _ x)b-1 
B(a,b)P(N =n) 
_ Kx"+4-1q = xyntb-1 
where K = On) does not depend on x. Hence, we 
~ B(ab)P(N=n) p : . 


can conclude that the conditional density of X given that 
N = nis beta with parameters n + a,m + b. Asa byproduct, 


Ca) 


_ 1 
we also see that BuabyPWen) = Batnbtm” 


that 


or equivalently 


("0") Bla +n,b +m) 


NS Baa,b) 


Chapter 7 


LI. (@)d= ¥ 1/n@ 
i=1 
(b) P{X = i} = P{[mU] =i —-}=P{fi-1smU <ij= 


Tim, 2=1hce,m 

m mm ie 
8 | 5 |- 5 apPk =a =d 
7.2. Let J; equal 1 if the jth ball withdrawn is white and the 
(j + 1) is black, and let J; equal 0 otherwise. If X is the num- 
ber of instances in which a white ball is immediately followed 
by a black one, then we may express X as 


nt+m-—1 


ae 2G 
j=l 


Thus, 
n+m—1L 


E[X]= >> Ely 
j=l 
n+m—1L 
= » Ptj selection is white, (j + 1) is black} 
j=l 
n+m—-1 
= > P{; selection is white}P{(j + 1) is black|j” is white} 
j=l 


n+m—1 
n m 


~ 2 n+mn+m-—1 
j=l 


The preceding used the fact that each of the n + m balls is 
equally likely to be the jth one selected and, given that that 
selection is a white ball, each of the other n + m — 1 balls 
is equally likely to be the next ball chosen. 


7.3. Arbitrarily number the couples, and then let J; equal 1 if 
married couple number j,j = 1,..., 10, is seated at the same 
table. Then, if X represents the number of married couples 
that are seated at the same table, we have 


10 
x= I 
j=l 
SO 


10 
E[X] = >- Ely] 
j=l 


(a) To compute E[Jj] in this case, consider wife number j. 
7 groups of size 3 not including her 
is equally likely to be the remaining members of her table, it 
follows that the probability that her husband is at her table is 


Hence, E[J;] = 3/19 and so 


Since each of the 


E[X] = 30/19 


(b) In this case, since the 2 men at the table of wife j are 
equally likely to be any of the 10 men, it follows that the 
probability that one of them is her husband is 2/10, so 


E[fj]=2/10 and E[X]=2 


7.4. From Example 21, we know that the expected number of 
times that the die need be rolled until all sides have appeared 
at least once is 6(1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6) = 14.7. 
Now, if we let X; denote the total number of times that side 
6 
i appears, then, since ) > Xj is equal to the total number of 
i=1 
rolls, we have 


6 6 
14.7=E| > X;| = >> EX] 
i=1 i=1 
But, by symmetry, ELX;] will be the same for all 7, and thus it 
follows from the preceding that ELX1] = 14.7/6 = 2.45. 


7.5. Let J; equal 1 if we win 1 when the jth red card to show 
is turned over, and let /; equal 0 otherwise. (For instance, / 
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will equal 1 if the first card turned over is red.) Hence, if X is 
our total winnings, then 


E[X|=£|>°§ | =)>_ Ff 
j=l j=l 


Now, J; will equal 1 if j red cards appear before j black cards. 
By symmetry, the probability of this event is equal to 1/2; 
therefore, E[J;] = 1/2 and E[X] = n/2. 

7.6. To see that N =n — 1 + J, note that if all events occur, 
then both sides of the preceding inequality are equal to n, 
whereas if they do not all occur, then the inequality reduces 
to N =n — 1, whichis clearly true in this case. Taking expec- 
tations yields 


E[N]) <n —1+4 Elq] 


However, if we let J; equal 1 if A; occurs and 0 other- 
wise, then 


n n n 
EIN)=£|) i) =>) Alil= > PG 

1 1 1 
Since E[/] = P(A, --- An), the result follows. 
7.7. Imagine that the values 1,2,...,” are lined up in their 
numerical order and that the k values selected are con- 
sidered special. From Example 3e, the position of the first 
special value, equal to the smallest value chosen, has mean 
Per ae kK n+l 
Peel ke 
For a more formal argument, note that X = j if none of 
the 7 — 1 smallest values are chosen. Hence, 


which shows that X has the same distribution as the random 
variable of Example 3e (with the notational change that the 
total number of balls is now n and the number of special balls 
is k). 

7.8. Let X denote the number of families that depart after 
the Sanchez family leaves. Arbitrarily number all the N — 1 
non-Sanchez families, and let /,,1 = r = N — 1, equal 1 if 
family r departs after the Sanchez family does. Then 


N-1 
= 
r=1 


Taking expectations gives 


N-1 
E[X] = pa P{family r departs after the Sanchez family} 


r=1 
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Now consider any non-Sanchez family that checked in k 
pieces of luggage. Because each of the k + j pieces of luggage 
checked in either by this family or by the Sanchez family is 
equally likely to be the last of these k + j to appear, the 
probability that this family departs after the Sanchez fam- 
ily is fy: Because the number of non-Sanchez families who 
checked in k pieces of luggage is nz, when k # j, or nj — 1 
when k = j, we obtain 


knx 1 
AN Da | = 3 


7.9. Let the neighborhood of any point on the rim be the arc 
starting at that point and extending for a length 1. Consider a 
uniformly chosen point on the rim of the circle—that is, the 
probability that this point lies on a specified arc of length x 


a 2 ee 
is ———and let X denote the number of points that lie in its 


nf 
neighborhood. With J; defined to equal 1 if item number j 
is in the neighborhood of the random point and to equal 0 
otherwise, we have 


19 
c= 7 
j=l 


Taking expectations gives 


19 
E[X] = me P{item j lies in the neighborhood of the 


j=l 
random point} 
But because item j will lie in its neighborhood if the random 


point is located on the arc of length 1 going from item j in the 
counterclockwise direction, it follows that 


1 
P{item j lies in the neighborhood of the random point} = a 
a 


Hence, 


1 
ELX] = 5 >3 


Because E[X] > 3, at least one of the possible values of X 
must exceed 3, proving the result. 


7.10. If g(x) = x!/2, then 
1 
go) = 5x ??, 


so the Taylor series expansion of ,/x about A gives 


VX = /d + yx =e ee aoe — ay 


Taking expectations yields 


E[VX] = Va + sa PBX ee in 


1 
=V, - ge a 


= Jy — 24-12 
8 


Bk PBX — ay] 


Hence, 
Var(VX) = E[X] — (E[VX])* 


1 2: 
oo (vi ee yt?) 


1 
=1/4 — — 
aa 


1/4 


7.11. Number the tables so that tables 1,2, and 3 are the ones 
with four seats and tables 4, 5, 6, and 7 are the ones with two 
seats. Also, number the women, and let_X, ij equal 1 if woman 
iis seated with her husband at table j. Note that 


2\ (18 
2)\ 2 3 
E[X;;] = (>) = 55° (=1,2,3 
4 


and 


Now, X denotes the number of married couples that are 
seated at the same table, we have 


10 7 
E[X)=E| >> Xy 
i=1 j=1 
10 3 10 7 
=>) EX) + 00S FLX] 
i=1 j=1 i=1 j=4 


7.12. Let X; equal 1 if individual i does not recruit anyone, 
and let X; equal 0 otherwise. Then 


E[Xj] = P{i does not recruit any of i + 1,i + 2,...,n} 


bl i n—2 
~ Gj gt n—-1 
bl 
~n—1 
Hence, 
fe Pepa A n 
E Xj) = — 
di 41n—1 2 
i=1 i=1 


From the preceding, we also obtain 


b= i-1\) @-)Dam-d 
vartx) = ——— (1 —7)- w= ie 


Now, fori < j, 


yiaiti. f-27-27-1 2-3 
E[XiXj] = F j-1 7 j+1 n-1 
oe ee Be 
~ (n — 2)(n — 1) 
Thus, 
yy, @-D)G-2  i-1j-1 
Cov(Xj, Xj) = (n — 2)(n — 1) n-—-in-1 
= 
(n — 2)(n — 1/2 
Therefore, 


n n n-1 on 
Var (E =) ¢Var(X)) + 29° SY Cov(X;, Xj) 
i=1 


i=1 i=1 j=i+1 
NM as ‘ n—-1 on ; . 
_ 3S (i 7 Ss i) 42 x s (i a a3 
re a ea oe 
> 
= @-— Im — iv 
(n i i=1 
1 n—-1 


G@ — 1)¢ 1) ( i — 1) 
@ Dm LX! n i)(n i 


7.13. Let X; equal 1 if the ith triple consists of one of each 
type of player. Then 


Hence, for part (a), we obtain 
3 
E\ > x | =6/7 
i=1 


It follows from the preceding that 


Var(X;) = (2/7)(1 — 2/7) = 10/49 
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Also, for i # j, 


= P(X; = P(X; = 11X; = 1) 


= 6/70 


Hence, for part (b), we obtain 


3 3 
Var (= x) = )°Var(Xj) + 2° > Cov(Xj,X)) 
i=1 i=1 


j>l 
3 6 4 
= 30/49 + 2 — = = 
‘sual oo (= =) 
7 312 
~ 490 


7.14. Let X;,i = 1,...,13, equal 1 if the ith card is an ace and 
let X; be 0 otherwise. Let Y; equal 1 if the jth card is a spade 
and let Y; = 0 otherwise. Now, 


13 13 
Cov(X, ¥) = Cov| 5° Xi, ° ¥; 


i=1 j=l 
13. 13 
= >>> Cow(X;, ¥;) 
i=1 j=1 


However, X; is clearly independent of Y; because know- 
ing the suit of a particular card gives no information about 
whether it is an ace and thus cannot affect the probabil- 
ity that another specified card is an ace. More formally, let 
Aj, s, Ai, hs Ai, d» Ai,c be the events, respectively, that card i is 
a spade, a heart, a diamond, and a club. Then 


1 
PY] = Uy = G(PIY) = Ais} + PUY) = MAj a} 
+ PLY; = VWAjza} + PLY; = WAie}) 


But, by symmetry, we have 


P(Y;) = VAs} = PLY] = Aja} = PLY; = 1Aia} 
= P{Y; = 1|Aic} 


Therefore, 
P(Y; = 1) = PY; = lAjs) 


As the preceding implies that 


PY; = 1} = P(Y; = 1148,} 
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we see that Y; and X; are independent. Hence, Cov(Xj, Yj) = 
0, and thus Cov(X, Y) = 0. 

The random variables X and Y, although uncorrelated, are 
not independent. This follows, for instance, from the fact that 


P(Y = 13|X = 4} =0 # P{Y = 13} 


715. (a) Your expected gain without any information is 0. 
(b) You should predict heads if p > 1/2 and tails otherwise. 
(c) Conditioning on V, the value of the coin, gives 


1 
E[Gain] = } E[Gain|V = p|dp 
0 


1/2 1 

= [ [1d—p)-1@)] dp+ | [1@)—10.—p)] ap 
0 1/2 

= {2 


7.16. Given that the name chosen appears in n(X) different 
positions on the list, since each of these positions is equally 
likely to be the one chosen, it follows that 


El\n(X)] = PUI = 1\n(X)} = 1/n(X) 


Hence, 
E\] = E[l/n(x)] 


Thus, E[mI] = E[m/n(X)] = d. 


7.17. Letting X; equal 1 if a collision occurs when the ith item 
is placed, and letting it equal 0 otherwise, we can express the 
total number of collisions X as 


m 
X=) Xj 
i=1 


Therefore, 
m 


E[X] = )* ELXi] 


=| 


To determine ELX;], condition on the cell in which it is 
placed. 


E[Xj] = So ELXi| placed in cell j]p; 
J 
— bk P{i causes collision|placed in cell jp; 
J 


=i - d = pty 
i 
=1- )°d - pp, 
J 


The next to last equality used the fact that, conditional on 
item i being placed in cell j, item i will cause a collision if any 


of the preceding i — 1 items were put in cell j. Thus, 


m n 


E[X]=m — )°>0d = pp 


i=1 j=1 


Interchanging the order of the summations gives 


n 
E[X)=m—n+ yid — pj" 
j=l 
Looking at the result shows that we could have derived 


it more easily by taking expectations of both sides of the 
identity 


number of nonempty cells = m — X 


The expected number of nonempty cells is then found by 
defining an indicator variable for each cell, equal to 1 if that 
cell is nonempty and to 0 otherwise, and then taking the 
expectation of the sum of these indicator variables. 


7.18. Let L denote the length of the initial run. Conditioning 
on the first value gives 


E[L] = E[L|first value is one] 


n+m 
m 


+ E|L|first value is zero] 

n+m 
Now, if the first value is one, then the length of the run will be 
the position of the first zero when considering the remaining 
n +m — 1 values, of whichn — 1 are ones and m are zeroes. 
(For instance, if the initial value of the remaining n + m — 1 
is zero, then L = 1.) As a similar result is true given that the 
first value is a zero, we obtain from the preceding, upon using 
the result from Example 3e, that 


ne-- m n n+m im 


E{L]= 

1] mt+in+m n+in+m 
_ on m 
—~m+i n+1 


7.19. Let X be the number of flips needed for both boxes to 
become empty, and let Y denote the number of heads in the 
first + m flips. Then 

n+m 
>> AIXIY =4P(y =) 
i=0 


— _a7f(urm i nt+m-i 
>) AXNIY = {5 } pd - p) 
i=0 


E[X] 


Now, if the number of heads in the first n + mi flips is i, 
i = n, then the number of additional flips is the number of 
flips needed to obtain an additional n — i heads. Similarly, if 
the number of heads in the first n + m flips isi,i > n, then, 
because there would have been a total ofn + m —i<m 
tails, the number of additional flips is the number needed 


to obtain an additional i — n heads. Since the number of 
flips needed for j outcomes of a particular type is a negative 
binomial random variable whose mean is j divided by the 
probability of that outcome, we obtain 


i=0 # 
n+m a? ee 
+ y- l ( ) pa pyitm 
—p 1 
i=n+1 


7.20. Taking expectations of both sides of the identity given 
in the hint yields 


E[X"|=E ; [ x"—lTy(x) ax] 
0 
=n be E[x”"1y@] dx 


=n [ert dx 
0 


oo a 
n [ xT Fx) dx 
0 


Taking the expectation inside the integral sign is justified 
because all the random variables [y (x),0 < x < oo, are non- 
negative. 

7.21. Consider a random permutation /),...,J, that is 
equally likely to be any of the n! permutations. Then 


Elayay,,;] = De Flay ag. Vi = K|PU = k} 
k 


1 
= Ye aE lag Uj = k] 
nk 


1 ‘ 
cs Yo ax So ajPUliy1 = Tj = k} 
k i 


1 
~ n(n — 1) doe Do 


k i#k 


1 
= a 1 Yo aK (ax) 
k 
<0 


where the final equality followed from the assumption that 
dL ai = 0. Since the preceding shows that 


n—1 
E S ayay.,| < 0 
j=l 
it follows that there must be some permutation /1,.. 


which 
n—-1 
4445.4, < 0 
j=l 


. ly for 
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7.22. (a) E[X] =A, + do, 


b 
( ) Covix, Y) = Cov(X, + X2,X2 + X3) 


= Cov(X1,X2 + X3) + Cov(X2,X2 + X3) 
= Cov(Xp, X2) 

= Var(X2) 

=)o 


E[Y] =A. + 3 


(c) Conditioning on X> gives 


P(X =i,Y =j} 
= DO P(IX =4,Y =X) = HP(X) =} 
k 
= > P(X, =i — k,X3 =j — WM Xp = ke? 05/k! 
k 
=) P(X, =i — k,X3 =j — Ke A8/k! 
k 
=> PIX, =i — P{X3=j — Keke 
k 
min(i,/) i-k j-k k 
= - eo Ay es X3 ew2 Ay 
a Gi — b! j — b! ki 


C eye Z 
7.23. Oo PoE ie ov()); Xi, 07; Y)) 
i j Jvar(i Xj) Var); Yj) 


7 di Dy Cov(Xj, Yj) 


noznoy 
dj Cov(Xj, Yi) + Yi; Ui Cov(Xj, Y;) 
~ NOxoy 
NPOxOy 
 NOxoy 


=p 


where the next to last equality used the fact that 
Cov(Xj, Yj) = POx0y 


7.24. Let X; equal 1 if the ith card chosen is an ace, and let it 
equal 0 otherwise. Because 


and E[X;] = P{X; = 1} = 1/13, it follows that ELX] = 3/13. 
But, with A being the event that the ace of spades is chosen, 
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we have 
E[X] = E[X|A]P(A) + E[X|AS]P(A 
_ 3 49 
= EX] 5 + E[X|A 2 


49 
= E[XIA]5 + 5F yo xIAe 


3 
49 
= E[X|A]— E[X;|AS 
DMlAlss + 59D ELMIA'] 
3 . 3 
52” 52° 51 
Using that ELX] = 3/13 gives the result 


52 (3 49 3 _~ _ 14176 
3 \13 5217) 17° 


E[X|A] = 


Similarly, letting L be the event that at least one ace is cho- 
sen, we have 


E[X] = E[X|L]P(L) + E[X|L°]P(LS 
= E[X|L]P(L) 

48 . 47 - 46 

=Fvili t=. 

ida ( saa) 

Thus, 
3/13 
E[X|L] = = ~ 1.0616 
1 — 3551-50 


Another way to solve this problem is to number the four 
aces, with the ace of spades having number 1, and then let 
Y; equal 1 if ace number / is chosen and 0 otherwise. Then 


E[X|A] = e| Sovim =1 
4 
=1+ )°ElY¥; =1] 
i=2 
=143-2 219/17 
=1+4 a / 


where we used that the fact given that the ace of spades is 
chosen the other two cards are equally likely to be any pair 
of the remaining 51 cards; so the conditional probability that 
any specified card (not equal to the ace of spades) is chosen 
is 2/51. Also, 


4 
E[X|L] = e| Sve = 0 EYL] = 4P(¥, = 111) 
i=1 
Because 
7 7 P(AL) P(A) —_—_—=3/52 
P{Y, =1|L} = P(A|L) = PI) = PQ) 7. Baa 


52-51-50 


we obtain the same answer as before. 


7.25. (a) E[I|X = x] 
=P{Z < x} = P(x) 
(b) It follows from part (a) that E[/|X] = 


=P{Z < X|X =x} =P{Z < x|X =x} 


®(X). Therefore, 


E(I] = ELE X]] = E[(X)] 


The result now follows because E[/]= P{I = 1}=P{Z < X}. 
(c) Since X — Z is normal with mean yz and variance 2, we 


have 


PIX > Z=PiX = Z> 0} 
pe ee 
_ /2 a/2. 


eZ) 


7.26. Let N be the number of heads in the firstn + m — 1 
flips. Let M = max(X, Y) be the number of flips needed to 
amass at least n heads and at least m tails. Conditioning on 
N gives 


E[M] = )_ E[MIN = iJP{N =i} 


i 


n-1 n+m—1 
=) E[MIN=iP(N=i} + > E[MIN=i]P{N =i} 
i=0 i=n 


Now, suppose we are given that there are a total of i heads 
in the first 1 + m — 1 trials. Ifi < n, then we have already 
obtained at least m tails, so the additional number of flips 
needed is equal to the number needed for an additional n — i 
heads; similarly, if i = n, then we have already obtained at 
least n heads, so the additional number of flips needed is 
equal to the number needed for an additional m — (n + 
m — 1 — i) tails. Consequently, we have 


n-1 é 
FIM] = > ( = *) Pu =i) 
i=0 
n+m—1 5 
4: > (n+ m 14 ti") p= 


=n mt 


=" + m— ‘)p iq—pyttr-i-i 


— ‘)pla = a 


The expected number of flips to obtain either m heads or m 
tails, E[min(X, Y)], is now given by 
n m 


E{min(x, Y)] = ELX + Y M\|= + 
[min(X, ¥)] = El Ino 


7.27. This is just the expected time to collect n — 1 of then 
types of coupons in Example 2i. By the results of that exam- 
ple the solution is 


n 
n—2 


n 
1+ — + 


eer AP aie Ae 


NS 


7.28. With g=1 — p, 


H= > Fx=g=) Pixegey = : oA 
i=l i=1 i=1 


7.29. Cov(X, Y) = E[XY] — E[X]E[Y] 
=P(X=1,¥=1) — PX =P =1) 


Hence, 


Cov(X,Y)=0 S&S PX =1,Y=1)=PX=1)P(Y =) 


Because 


Cov(X, Y) = Cov(1 X,1 
= —Cov(X,1 — Y) 


Y) = —Cov(1 X,Y) 


the preceding shows that all of the following are equivalent 
when X and Y are Bernoulli: 

1. Cov(X, Y) =0 
PX =1,Y=)=P(X=)P(Y =1) 
Pd-X =1,1-Y=1)=Pd-xX=)Pd-Y=) 
PA -X=1,Y=)=P0 - X=)P(Y=1) 
PX =1,1 - Y=)=PX=)PA - Y=1) 


we bt 


7.30. Number the individuals, and let Xjj equal 1 if the jth 
individual who has hat size i chooses a hat of that size, and 
let X;,; equal 0 otherwise. Then the number of individuals 
who choose a hat of their size is 


ro Ni 
x= VV Xi 


i=1 j=1 
Hence, 
ron ron hj 1 r 
E[X] = > Ax = >» Paes So Ain 
i=1 j=1 i=1 j=1 i=1 


7.31. Letting o2 and oF be, respectively, the variances of X 
and of Y, we obtain, upon squaring both sides, the equivalent 
inequality 


Var(X + Y) S on + oy + 2oxoy 
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Using that Var(X¥ + Y) = o2 + oy + 2Cov(X, Y), the pre- 
ceding inequality becomes 


Cov(X, Y) _ 
Oxy a 


Corr(X, Y) = 1 


which has already been established. 


7.32. Noting that X is equal to i plus the number of the val- 
ues Ry41,---,Rn+m that are smaller than_X, it follows that if 
we let J+, equal 1 if Rp+% < X and let it equal 0 otherwise, 
that 


m 
Raid ¥ tae 
k=1 
Taking expectations gives that 


E[X]=i+ D> Elna) 
k=1 


Now, 


EUn+k] = P(Rntk < X) 
= P(Rnik < i” smallest of Rj,...,Rn) 
= P(R,+, is one of the i smallest of the values 
Ry,..-, Rn, Rn+k) 
i 
n+1 


where the final equality used that R,, is equally likely to be 
either the smallest, the second smallest, ..., or the (n + 1)" 
smallest of the values Ry,...,Rn, Ry+x. Hence, 


Ll 
E[X] =i 
[X]=i+ mo —— 
7.33. (a) ELX] = fo ELX|Y = yldy = fo }dy = 1/4 
2, 
(b) ELXY] = fg E[XYIY = yldy = fy 4 dy = 1/6, which 
gives that Cov(X, Y) = 1/6 — 1/8 = 1/24 
2 . . 
(©) E[X?] = fj ELX71Y = yldy = fj Ydy = 1/9, giving 
that Var(XY) = 5 = te oe 
(d) 


1 
px sys f P(X = x|Y =y)dy 
0 


x i 
y P(X = x|Y =y)dy + / P(X = x|Y =y)dy 
0 x 
x 1 
=) dy +f “dy 
0 rae 
=x — xlog(x) 


(e) Differentiate part (d) to obtain the density f(x) = 
—log(x),0 < x < I. 
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Chapter 8 


8.1. Let X denote the number of sales made next week, and 
note that X is integral. From Markov’s inequality, we obtain 
the following: 


(a) P(X > 18} = P{X = 19} < Ae = 16/19 


(b) P{X > 25} = P{LX = 26} = VA 16/26 
8.2. (a) P{10 = X = 22} = P{|X — 16| = 6} 
= P{|X — p| = 6} 
=1- P{|xX — p| > 6 
= 1 —- 9/36 =3/4 


(b) P(X = 19}= P{X —- 16=3} = sa? 
In part (a), we used Chebyshev’s inequality; in part (b), we 
used its one-sided version. (See Proposition 5.1.) 


8.3. First note that ELY — Y]=0 and 
Var(X — Y) = Var(X) + Var(Y) — 2Cov(X, Y) = 28 


Using Chebyshev’s inequality in part (a) and the one-sided 
version in parts (b) and (c) gives the following results: 


(a) P{|X — Y| > 15} < 28/225 


28 
PIX = ¥ > 15) = —“"— = 28/7253 

met 2 oR og 
PY — X > 15} < —~° = 28/253 

FI 2 INS oe os 


8.4. If X is the number produced at factory A and Y the 
number produced at factory B, then 


E[Y — X]=-2, Var(Y — X)=36 + 9=45 
PY —-X>0}=PIY-X= 
45 
45 +9 


=P{Y —-X+4+223}s = 45/54 
8.5. Note first that 
1 
E[X;j] = i: 2x? dx = 2/3 
0 


Now use the strong law of large numbers to obtain 


“Ln Suh 
= 1/(2/3) = 3/2 


8.6. Because ELX;] = 2/3 and 


1 
E[X?] =) 2x3 dx = 1/2 


we have Var(X;) = 1/2 — (2/3) = 1/18. Thus, if there aren 
components on hand, then 


P{Sy = 35} = P{Sy, = 34.5} (the continuity correction) 
=» |® = 2n/3 _ 34.5 - e4| 


Vn/18 Vn/18 
34.5 — 2n/3 
= PIZ => 
| VJn/18s8 


where Z is a standard normal random variable. Since 
P{Z > —1.284} = P{Z < 1.284} = .90 
we see that n should be chosen so that 
(34.5 — 2n/3) © —1.284,/n/18 
A numerical computation gives the result n = 55. 


8.7. If X is the time required to service a machine, then 
E[X] = .2 + 3=.5 


Also, since the variance of an exponential random variable 
is equal to the square of its mean, we have 


Var(X) = (.2)? + (3)? = 13 


Therefore, with X; being the time required to service job 
i,i = 1,...,20, and Z being a standard normal random vari- 
able, it follows that 


X1+-+++X29—10 8—10 

P(X +---4+Xo < 8} =P < —= 

aa 20 < 3 V2.6 V2.6 
~ P{Z < —1.24035} 


1074 


v 


8.8. Note first that if X is the gambler’s winnings on a single 
bet, then 


E[X)=-.7- 441= =1, EL] =.7+ 8+ 10=11.5 
—>Var(X) = 11.49 


Therefore, with Z having a standard normal distribution, 


X{+---+Xjo90+10 +10 
P(X} +--+ +Xi99 = —.5} = P| a = 
V1149 V1149 
=~ P{Z < .2803} 
= .6104 
8.9. Using the notation of Problem 8.7, we have 
X,+---+Xo9—-10 t—10 
P{X,+---+X99 < Hf =P — < —=— 
. a | V2.6 V2.6 


Now, P{Z < 1.645} ~ .95, so t should be such that 


t — 10 
= 1.645 


which yields ¢ ~ 12.65. 

8.10. If the claim were true, then, by the central limit theo- 
rem, the average nicotine content (call it X) would approx- 
imately have a normal distribution with mean 2.2 and stan- 
dard deviation .03. Thus, the probability that it would be as 
high as 3.1 is 


X — 22 3.1 — 2.2 
PLEX = 3.1} = P| 03 > 03 
=~ P{Z > 30} 
= 0 


where Z is a standard normal random variable. 

8.11. (a) If we arbitrarily number the batteries and let Xj; 
denote the life of battery i,i = 1,...,40, then the X; are 
independent and identically distributed random variables. To 
compute the mean and variance of the life of, say, battery 1, 
we condition on its type. Letting J equal 1 if battery 1 is type 
A and letting it equal 0 if it is type B, we have 


E[X,|I = 1] = 50, E[X4|I = 0] = 30 
yielding 
E[X1] = 50P{ = 1} + 30P{ = 0} = 50(1/2) + 30(1/2) = 40 


In addition, using the fact that E [w?] =(E [w])2 + Var(W), 
we have 


E[X?|I = 1] = (50)? + (15)? = 2725, 
E[X{|I = 0] = (30)? + 6° = 936 
yielding 
E[X7] = (2725)(1/2) + (936)(1/2) = 1830.5 


Thus, X j,...,X49 are independent and identically dis- 
tributed random variables having mean 40 and variance 
1830.5 — 1600 = 230.5. Hence, with § = )-#2, X;, we have 

E[S] = 40(40) = 1600, Var(S) = 40(230.5) = 9220 


and the central limit theorem yields 


— 1600 1700 — 1600 
P{S > 1700} = P mae 
9220 /9220 


= P{Z > 1.041} 
=1— (1.041) = .149 


(b) For this part, let $4 be the total life of all the type A bat- 
teries and let Sg be the total life of all the type B batteries. 
Then, by the central limit theorem, $4 has approximately a 
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normal distribution with mean 20(50) = 1000 and variance 
20(225) = 4500, and Sg has approximately a normal distri- 
bution with mean 20(30) = 600 and variance 20(36) = 720. 
Because the sum of independent normal random variables 
is also a normal random variable, it follows that S4 + Sp 
is approximately normal with mean 1600 and variance 5220. 
Consequently, with $= S4 + Sp, 


— 1600 1700 — 1600 
P{S > 1700} =P a oa 
5220 5220 


= P{Z > 1.384} 
=1 — (1.384) = .084 


8.12. Let N denote the number of doctors who volunteer. 
Conditional on the event N = i, the number of patients 
seen is distributed as the sum of i independent Poisson ran- 
dom variables with common mean 30. Because the sum of 
independent Poisson random variables is also a Poisson ran- 
dom variable, it follows that the conditional distribution of 
X given that N = 7 is Poisson with mean 30i. Therefore, 


E[X|N] =30N  Var(X|N) = 30N 


As a result, 
E[X] = E[ELX|N]] = 30E[N] = 90 
Also, by the conditional variance formula, 


Var(X) = E[Var(X|N)] + Var(ELX|N]) 
= 30E[N] + (30)?Var(N) 


Because 
1 
Var(N) = rca + 3? + 47) —9=2/3 


we obtain Var(X) = 690. 

To approximate P{X > 65}, we would not be justified in 
assuming that the distribution of X is approximately that of 
a normal random variable with mean 90 and variance 690. 
What we do know, however, is that 


4 4 
, les 
P{X > 65} = De > 65|N =i}P{N =i} = 52 
i= i 


where P;(65) is the probability that a Poisson random vari- 
able with mean 30/ is greater than 65. That is, 


65 
P;(65) =1 — Se 301 /j! 
j=0 


Because a Poisson random variable with mean 30i has the 
same distribution as does the sum of 307 independent Pois- 
son random variables with mean 1, it follows from the central 
limit theorem that its distribution is approximately normal 
with mean and variance equal to 30i. Consequently, with 
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X; being a Poisson random variable with mean 30i and Z 
being a standard normal random variable, we can approxi- 
mate P;(65) as follows: 
P;(65) = P{X > 65} 
= P{X = 65.5} 


=» |X 


V30i V30i 


Oo = 


65.5 — =} 


yt PIiZ= 
| V30i 
Therefore, 


P (65) © P{Z = .7100} ~ .2389 

P3(65) © P{Z = —2.583} ~ .9951 

P4(65) © P{Z = —4.975} © 1 
leading to the result 


P{X > 65} = .7447 


If we would have mistakenly assumed that X was approx- 
imately normal, we would have obtained the approximate 
answer .8244. (The exact probability is .7440.) 


8.13. Take logarithms and then apply the strong law of large 
numbers to obtain 


1/n 
n n 
1 
log} | | [Xi = — ) Jlog(X;)> Ellog(X))] 
i=1 i=1 
Therefore, 
i 1/n 
(i x) —> eAllog(Xi)] 
i=1 


8.14. Let X; be the time it takes to process book i, and let 
Sn = Vey Xi- 
(a) With Z being a standard normal 


S4q — 400 420 — 400 
P{Sg) > 420) =P 
{S40 V40-9 V40-9 


wee gs EON no ane 
360 
Sos = 250 . 240 — 250 
(b) P {S25 = 240} = P| 7 = 


J25-9 J25-9 
~ Plz < —19] ~ 2595 
15 


We have assumed that the successive book processing times 
are independent. 


8.15. Let P(X = i) = 1/n,i = 1,...,n. Also, let f(x) = ay 
and g(x) = by. Then f and g are both increasing functions 


and so E[f(X)g(X)] = E[fQOJE[g(4)], which is equivalent 
to 


iW (ee dee 
9 Midi 2 Ge Lae di) 
i=1 i=1 i=1 


Chapter 9 

9.1. From axiom (iii), it follows that the number of events 
that occur between times 8 and 10 has the same distribution 
as the number of events that occur by time 2 and thus is a 
Poisson random variable with mean 6. Hence, we obtain the 
following solutions for parts (a) and (b): 

(a) P{N(10) — N(8) = 0} = e~® 

(b) E[N(10) — N(8)] = 6 

(c) It follows from axioms (ii) and (iii) that from any point 
in time onward, the process of events occurring is a Pois- 
son process with rate 4. Hence, the expected time of the 
fifth event after 2PM. is2 + E[S5] = 2 + 5/3. That is, the 
expected time of this event is 3:40 PM. 


9.2. (a) 
P{N(1/3) = 2|N(1) = 2} 
_ PING/3)S=2,NO)=2) 
~ P{N(1) = 2} 
_ P{N(1/3) = 2,N() — N(1/3) = 0} 
~ P{N(1) = 2} 
_ P{N(/3) = 2}P{N() — N(1/3) = 0} 
~ P{N(1) = 2} 
_ P{N(1/3) = 2}P{N(2/3) = 0} 
7 P{N(1) = 2} 
e 4/3 (A/3)2 /2!e724/3 
- e74)2/2! 


(by axiom (ii)) 


(by axiom (iii)) 


=1/9 


(b) 
P{N(1/2) = 1|N(1) = 2} =1 — P{N(1/2) = O|N() = 2} 
P{N(1/2) = 0, N(1) = 2} 


P{N(1) = 2} 
_ 1 _ PING/2) =0,NG) — NG/2) =2} 
P{N(1) = 2} 
_ 1 _ PN/2) = 0}PING) — N(1/2) = 2} 
P{N() = 2} 
4 P{N(1/2) = O}P{N(1/2) = 2} 
P{N(1) = 2} 
e4/2¢-4/2(4,/2)2/2! 
=1 
e7* 42/2! 
=1- 1/4=3/4 


9.3. Fix a point on the road and let X,, equal 0 if the nth vehi- 
cle to pass is a car and let it equal 1 if it is a truck, n = 1. We 


now suppose that the sequence Xy,n = 1, is a Markov chain 
with transition probabilities 


Pog =5/6, Po, =1/6, Pip =4/5, Py, =1/5 


Then the long-run proportion of times is the solution of 


mo = 19(5/6) + 14(4/5) 
my = m9 (1/6) + 24 (1/5) 
m+ m=1 


Solving the preceding equations gives 


my = 24/29 my = 5/29 


Thus, 2400/29 ~ 83 percent of the vehicles on the road 
are cars. 


9.4. The successive weather classifications constitute a 
Markov chain. If the states are 0 for rainy, 1 for sunny, and 2 
for overcast, then the transition probability matrix is as fol- 
lows: 


6 i212 
P= 1/3 1/3 1/3 
1/3 1/3 1/3 


The long-run proportions satisfy 


mq = 7 (1/3) + m9(1/3) 

my = m9(1/2) + m4 (1/3) + 22(1/3) 

m2 = mg (1/2) + (1/3) + 22(1/3) 
l=m +m +7 


The solution of the preceding system of equations is 


mg =1/4, my =3/8, m2 =3/8 
Hence, three-eighths of the days are sunny and one-fourth 


are rainy. 


9.5. (a) A direct computation yields 
H(X)/H(Y) ~ 1.06 


(b) Both random variables take on two of their values with 
the same probabilities .35 and .05. The difference is that if 
they do not take on either of those values, then_X, but not Y, 
is equally likely to take on any of its three remaining possi- 
ble values. Hence, from Theoretical Exercise 9.13, we would 
expect the result of part (a). 


Chapter 10 
10.1. (a) 1 = C fj dx > C =1/(e — 1) 
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(b) Fa) =Cfjerdy= SS}, O<x=1 
Hence, if we let_ Y = F-l(U), then 
Xx 
es —1 
U= 
e-—1 


or 
X = log(U(e — 1) + 1) 


Thus, we can simulate the random variable X by generating a 
random number U and then setting XY = log(U(e — 1) + 1). 


10.2. Use the acceptance-rejection method with g(x) = 
1,0 < x < 1. Calculus shows that the maximum value of 
f (x)/g(x) occurs at a value of x,0 < x < 1, such that 


2x — 6x2 + 4° =0 
or, equivalently, when 
4x* — 6x + 2= (4x — 2) — 1)=0 
The maximum thus occurs when x = 1/2, and it follows that 
C = maxf(x)/g(x) = 30(1/4 — 2/8 + 1/16) = 15/8 


Hence, the algorithm is as follows: 


Step 1. Generate a random number U}. 

Step 2. Generate a random number U). 

Step 3. If U, = 16(U; — 2U} + Uj), set X = Uj; else 
return to Step 1. 

10.3. It is most efficient to check the higher probability val- 
ues first, as in the following algorithm: 

Step 1. Generate a random number U. 

Step 2. If U = .35, set X =3 and stop. 

Step 3. If U = .65, set X¥ = 4 and stop. 

Step 4. If U = .85, set X¥ =2 and stop. 


Step 5. X = 1. 
10.4.2 — X 
10.5. (a) Generate 2n independent exponential random 
variables with mean 1, X;, Yj,i = 1,...,n, and then use the 


n 
estimator )* e*i¥i/n, 
i=1 
(b) We can use XY as acontrol variate to obtain an estimator 
of the type 


n 
Seti + cXi¥p)/n 
i=1 


Another possibility would be to use XY + X*Y2/2 as the 
control variate and so obtain an estimator of the type 


n 
SieXiM + c[Xi¥) + X7Y7/2 — 1/2)/n 
i=1 
The motivation behind the preceding formula is based on the 


fact that the first three terms of the MacLaurin series expan- 
sion of e” are 1 + xy + (x*y*)/2. 
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Common Discrete Distributions 


Bernoulli(p) X indicates whether a trial that results in a success with probability 
p is a success or not. 


P(X =1}=p 
PIX =0}=1-—p 


E[X] =p, Var(X) = p( — p). 
Binomial(n, p) X represents the number of successes in n independent trials when 
each trial is a success with probability p. 


PIX =i} = ("Jord — pyi, i=0,1,...,n 


E[X]=np, Var(X) =np(1 — p). 

Note. Binomial(1, p) = Bernoulli(p). 

Geometric(p) X is the number of trials needed to obtain a success when each trial 
is independently a success with probability p. 


P(X =i)=pd — py ', i=1,2,..., 


E[X]= 3, Var(X) = #. 
Negative Binomial(r,p) X is the number of trials needed to obtain a total of r 
successes when each trial is independently a success with probability p. 


| 
rx=a=(' 7 a — py", i=rnr+1,r4 2,... 


E[X]=£, Var(X) = re. 

Notes. 

1. Negative Binomial(1,p) = Geometric(p). 

2. Sum of r independent Geometric(p) random variables is Negative Binomial(r, p). 
Poisson(A.) _X is used to model the number of events that occur over a set interval 
when these events are either independent or weakly dependent and each has a 
small probability of occurrence. 


P{X =i}=e%rvV/i!, i=0,1,2,... 


E[X] =, Var(X) =i. 

Notes. 

1. A Poisson random variable X with parameter A = np provides a good approxi- 
mation to a Binomial(n, p) random variable when n is large and p is small. 

2. If events are occurring one at a time in a random manner for which (a) the 
number of events that occur in disjoint time intervals is independent and (b) the 
probability of an event occurring in any small time interval is approximately A 
times the length of the interval, then the number of events in an interval of length 
t will be a Poisson(At) random variable. 

Hypergeometric(m, N—m,n) X is the number of white balls in a random sample 
of n balls chosen without replacement from an urn of N balls of which m are white. 


Co) 
() 


PIX =i} = » #2012... 


The preceding uses the convention that (j) = Oifeitherj < Oorj > +r. 


With p =m/N, E[X]=np, Var(X) = }=2np(1 — p). 
Note. If each ball were replaced before the next selection, then X would be a 
Binomial (n, p) random variable. 


Negative Hypergeometric X is the number of balls that need be removed from an 
urn that contains n + m balls, of which n are white, until a total of r white balls 
has been removed, where r = n. 


— ,ntm+l _— mr(nt+l—r)(n+m+1) 
E[X)=r > =~Var(X) = oe tee 


Common Continuous Distributions 


Uniform (a,b) X is equally likely to be near each value in the interval (a, b). Its 
density function is 


1 
f@ =, a«<x<b 
b-—a 


E[X] = 2, Var(X) = & 2. 


Normal(,o7) X is a random fluctuation arising from many causes. Its density 
function is 


1 Vijay 
f@= — ag Gwe" ns eX < OO 
V2 0 


E[X] =, Var(X) =o”. 

1. When « = 0,0 = 1, X is called a standard normal. 

Notes. 

2. If X is Normal(, 07), then Z = x“ is standard normal. 

3. Sum of independent normal random variables is also normal. 

4. Animportant result is the central limit theorem, which states that the distribution 
of the sum of the first n of a sequence of independent and identically distributed 
random variables becomes normal as n goes to infinity, for any distribution of these 
random variables that has a finite mean and variance. 

Exponential() X is the waiting time until an event occurs when events are occur- 
ring atarate A > 0. Its density is 


f(x) =ae*, x >0 


E[X]= 4, Var(X) = ar P(X >xy=e*,x>0. 
Note. X is memoryless, in that the remaining life of an item whose life distribution 
is Exponential(A) is also Exponential(A), no matter what the current age of the 


item is. 


Gamma(qa,) When a = n, X is the waiting time until 1 events occur when events 
are occurring at arate A > 0. Its density is 


net (anje-t 


t)= t 0 
fo Tay t= 
where ['(a) = i e*x—ldx is called the gamma function. 
E[X|]=, Var(X)= en 
Notes. 


1. Gamma(1,4) = Exponential(A). 

2. If the random variables are independent, then the sum of a Gamma(q1,A) and 
a Gamma(q2, A) is a Gamma(q; + a2,A). 

3. The sum of n independent and identically distributed exponentials with para- 
meter A is a Gamma(n, A) random variable. 

Beta(a,b) X is the distribution of a random variable taking on values in the inter- 
val (0,1). Its density is 


= a, 0 ex <4 


f@a= Bab) 


where B(a,b) = i x*-l(1 — x)>-!dx is called the beta function. 

ELX] = sty, Var(X) = appa: 

Notes. 

1. Beta(, 1) = Uniform(0, 1). 

2. The j” smallest of n independent Uniform (0, 1) random variables is a Beta(j,n — 
j + 1) random variable. 

Chi-Squared(n) X is the sum of the squares of n independent standard normal 
random variables. Its density is 


—x/2,5-1 
en ee 

2/21 (n/2) 
Add E(X) and Var(X) 
Notes. 
1. Chi-Squared() = Gamma(n/2, 1/2). 
2. The sample variance of n independent and identically distributed Normal (1, 07) 
random variables multiplied by ash is a Chi-Squared(n — 1) random variable, and 
it is independent of the sample mean. 
Cauchy X is the tangent of a uniformly distributed random angle between —z/2 
and z/2. Its density is 


E[X] is undefined. 


Pareto(A, a) If Y is exponential with rate 1 anda > 0, then X¥ = ae 
Pareto with parameters A and a. Its density is 


Y is said to 


f@= he oO, x>a 


2 
oss a4 > 1, BLX] = 4%, and when A > 2, Var(X) = oe 
ote. 


The conditional distribution of X given that it exceeds x9 > ais Pareto (A, x9). 


This is a special edition of an established title widely used by colleges and 
universities throughout the world. Pearson published this exclusive edition 
for the benefit of students outside the United States and Canada. If you 
purchased this book within the United States or Canada, you should be aware 
that it has been imported without the approval of the Publisher or Author. 


A First Course in Probability offers an elementary introduction to the theory of 
probability for students of mathematics, statistics, engineering, and the sciences. 
This text not only equips students with the mathematics of probability theory 
but also teaches them the many possible applications of probability. 


The text includes intuitive explanations that are supported by an abundance of 
examples to build students’ interest in probability. Three sets of chapter-end 
exercises—Problems, Theoretical Exercises, and Self-Test Problems and 
Exercises—appear in the text. Self-Test Problems and Exercises feature complete 
solutions in the appendix, allowing students to test their comprehension and to 
prepare for exams. 


The tenth edition introduces new material on the Pareto distribution (Section 5.6.5), 
on Poisson limit results (Section 8.5), and on the Lorenz curve (Section 8.7), as well 
as many new and updated problems and exercises. 


He ae on Ae a Ae 0 
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